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Preface to the Third Edition 


The field of matrix computations continues to grow and mature. In 
the Third Edition we have added over 300 new references and 100 new 
problems. The LINPACK and EISPACK citations have been replaced with 
appropriate pointers to LAPACK with key codes tabulated at the beginning 
of appropriate chapters. 

In the First Edition and Second Edition we identified a small number 
of global references: Wilkinson (1965), Forsythe and Moler (1967), Stewart 
(1973), Hanson and Lawson (1974) and Parlett (1980). These volumes are 
as important as ever to the research landscape, but there are some mag- 
nificent new textbooks and monographs on the scene. See The Literature 
section that follows. 

We continue as before with the practice of giving references at the end 
of each section and a master bibliography at the end of the book. 

The earlier editions suffered from a large number of typographical errors 
and we are obliged to the dozens of readers who have brought these to our 
attention. Many corrections and clarifications have been made. 

Here are some specific highlights of the new edition. Chapter 1 (Matrix 
Multiplication Problems) and Chapter 6 (Parallel Matrix Computations) 
have been completely rewritten with less formality. We think that this 
facilitates the building of intuition for high performance computing and 
draws a better line between algorithm and implementation on the printed 
page. 

In Chapter 2 (Matrix Analysis) we expanded the treatment of CS de- 
composition and included a proof. The overview of floating point arithmetic 
has been brought up to date. In Chapter 4 (Special Linear Systems) we 
embellished the Toeplitz section with connections to circulant matrices and 
the fast Fourier transform. A subsection on equilibrium systems has been 
included in our treatment of indefinite systems. 

À more accurate rendition of the modified Gram-Schmidt process is 
offered in Chapter 5 (Orthogonalization and Least Squares). Chapter 8 
(The Symmetric Eigenproblem) has been extensively rewritten and rear- 
ranged so as to minimize its dependence upon Chapter 7 (The Unsymmet- 
ric Eigenproblem). Indeed, the coupling between these two chapters is now 
60 minimal that it is possible to read either one first. 

In Chapter 9 (Lanczos Methods) we have expanded the discussion of 
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the unsymmetric Lanczos process and the Arnoldi iteration. The “unsym- 
metric component” of Chapter 10 (Iterative Methods for Linear Systems) 
has likewise been broadened with a whole new section devoted to various 
Krylov space methods designed to handle the sparse unsymmetric linear 
system problem. 

In §12.5 (Updating Orthogonal Decompositions) we included a new sub- 
section on ULV updating. Toeplitz matrix eigenproblems and orthogonal 
matrix eigenprobiems are discussed in §12.6. 

Both of us look forward to continuing the dialog with our readers. As 
we said in the Preface to the Second Edition, "It has been a pleasure to 
deal with such an interested and friendly readership." 

Many individuals made valuable Third Edition suggestions, but Greg 
Ammar, Mike Heath, Nick Trefethen, and Steve Vavasis deserve special 
thanks. 

Finally, we would like to acknowledge the support of Cindy Robinson 
at Cornell. A dedicated assistant makes a big difference. 


Software 


LAPACK 


Many of the algorithms in this book are implemented in the software pack- 
age LAPACK: 


E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. DuCrocz, 
A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. 
Sorensen (1995). LAPACK Users’ Guide, Release 2.0, 2nd ed., SIAM 
Publications, Philadelphia. 


Pointers to some of the more important routines in this package are given 
at the beginning of selected chapters: 


Chapter 1. Level-1, Level-2, Level-3 BLAS 

Chapter 3. General Linear Systems 

Chapter 4. Positive Definite and Band Systems 

Chapter 5. Orthogonalization and Least Squares Problems 
Chapter 7. The Unsymmetric Eigenvalue Problem 
Chapter 8. The Symmetric Eigenvalue Problem 


Our LAPACK references are spare in detail but rich enough to "get you 
started." Thus, when we say that _TRSV can be used to solve a triangular 
system Az = b, we leave it to you to discover through the LAPACK manual 
that A can be either upper or lower triangular and that the transposed 
system AT zr = b can be handled as well. Moreover, the underscore is a 
placeholder whose mission is to designate type (single, double, complex, 
etc). 

LAPACK stands on the shoulders of two other packages that are mile- 
stones in the history of software development. EISPACK was developed in 
the early 1970s and is dedicated to solving symmetric, unsymmetric, and 
generalized eigenproblems: 


B.T. Smith, J.M. Boyle, Y. Ikebe, V.C. Klema, and C.B. Moler (1970). 
Matriz Figensystem Routines: EISPACK Guide, 2nd ed., Lecture Notes 
in Computer Science, Volume 6, Springer-Verlag, New York. 
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B.S. Garbow, J.M. Boyle, J.J. Dongarra, and C.B. Moler (1972), Matrix 
Eigensystem Routines: EISPACK Guide Extension, Lecture Notes in 
Computer Science, Volume 51, Springer-Verlag, New York. 


LINPACK was developed in the late 1970s for linear equations and least 
squares problems: 


EISPACK and LINPACK have their roots in sequence of papers that feature 
Algol implementations of some of the key matrix factorizations. These 
papers are collected in 


J.H. Wilkinson and C. Reinsch, eds. (1971). Handbook for Automatic 
Computation, Vol. 2, Linear Algebra, Springer-Verlag, New York. 


NETLIB 
A wide range of software including LAPACK, EISPACK, and LINPACK is 
available electronically via Netlib: 


World Wide Web:  http://www.netlib.org/index.html 
Anonymous ftp: ftp://ftp.netlib.org 


Via email, send a one-line message: 


mail netliddorni. gov 
send index 


to get started. 
MATLAB® 


Complementing LAPACK and defining a very popular matrix computation 
enviroument is MATLAB: 


MATLAB User's Guide, The MathWorks Inc., Natick, Massachusetts. 


M. Marcus (1993). Matrices and MATLAB: Á Tutorial, Prentice Hall, Up- 
per Saddle River, NJ. 


R Pratap (1995). Getting Started with MATLAB, Saunders College Pub- 
lishing, Fort. Worth, TX. 


Many of the problems in Mairiz Computations are best posed to students 
a8 MATLAB problems. We make extensive use of MATLAB notation in the 
presentation of algorithms. 


Selected References 


Each section in the book concludes with an annotated list of references. 
A master bibliography is given at the end of the text. 

Useful books that collectively cover the field, are cited below. Chapter 
titles are included if appropriate but do not infer too much from the level 
of detail because one author's chapter may be another's subsection. The 
citations are classified as follows: 


Pre-1970 Classics. Early volumes that set the stage. 
Introductory (General). Suitable for the undergraduate classroom. 
Advanced (General). Best for practitioners and graduate studenta. 
Analytical. For the supporting mathematics. 

Linear Equation Problems. Ar = 6. 

Linear Fitting Problems. Ax = b. 

Eigenvalue Problems. Ar — Ar. 

High Performance. Parallel/vector issues. 

Edited Volumes. Useful, thematic collections. 


Within each group the entries are specified in chronological order. 
Pre-1970 Classics 


V.N. Faddeeva (1959). Computational Methods of Linear Algebra, Dover, 
New York. 


Basic Material from Linear Algebra Systeme of Linear Equations The Proper 
Numbers and Proper Vectors of à Matrix. 


E. Bodewig (1959). Matriz Calculus, North Holland, Amsterdam. 


Matrix Calculus. Direct Methods for Linear Equations. Indirect Methods for Linear 
Equations. Inversion of Matrices. Geodetic Matrices. Eigenproblems. 


R.S. Varga (1962). Matrix Iterative Analysis, Prentice-Hall, Englewood 
Cliffs, NJ. 
Matrix Properties and Concepts. Nonnegative Matrices, Basic Iterative Methods 
and Comparison Theorems. Successive Overrelaxation Iterative Methods. Semi- 
Iterative Methods. Derivation and Solution of Elliptic Difference Equations. Alter- 
nating Direction Implicit Iterative Methods. Matrix Methods for Parabolic Partial 
Differential Equations. Estimation of Acceleration Parameters. 
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Matrix Computations. 


A.S. Householder (1964). Theory of Matrices in Numerical Analysts, Blais-. 


dell, New York. Reprinted in 1974 by Dover, New York. 


Some Basic Identities and Inequalities. Norms, Bounds, and Convergence. Localiza- 
tion Theorems and Other Inequalities. The Solution of Linear Systems: Methods of 
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. Fox (1964). An Introduction to Numerical Linear Algebra, Oxford Uni- 
versity Press, Oxford, England. 


Introduction, Matrix Algebra. Elimination Methods of Gauss, Jordan, and Aitken. 
Compact Elimination Methods of Doolittle, Crout, Banachiewicz, and Cholesky. 
Orthogonalization Methods. Condition, Accuracy, and Precision. Comparison of 
Methods, Measure of Work. Iterative and Gradient Methods. Iterative methods for 
Latent Roots and Vectors. Transformation Methods for Latent Roots and Vectors. 
Notes on Error Analysis for Latent Roots and Vectors. 
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Theoretical Background. Perturbation Theory. Error Analysis. Solution of Lin- 
ear Algebraic Equations. Hermitian Matrices. Reduction of a General Matrix to 
Condensed Form. Eigenvalues of Matrices of Condensed Forms. The LR and QR 
Algorithms. Iterative Methods. 


G.E. Forsythe and C. Moler (1967). Computer Solution of Linear Algebraic 
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Form of a Matrix Under Orthogonal Equivalence. Proof of Diagonal Form Theorem. 
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tered in Practical Problems. Sources of Computational Problems of Linear Algebra. 
Condition of a Linear System. Gaussian Elimination and LU Decompomtion. Need 
for Interchanging Rows. Scaling Equations and Unknowns. The Crout and Doolit- 
tle Variants. Iterative Improvement. Computing the Determinant. Nearly Singular 
Matrices. Algol 60 Program. Fortran, Extended Algol, and PL/I Programs. Ma- 
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Rounding Error in Gaussian Elimination. Convergence of Iterative Improvement. 
Positive Definite Matrices; Band Matrices. iterative Methods for Solving Linear 
Systems. Nonlinear Systems of Equations. 


REFERENCES xvii 


Introductory (General) 


A.R. Gourlay and G.A. Watson (1973). Computational Methods for Matriz 
Eigenproblems, John Wiley & Sons, New York. 
Introduction. Background Theory. Reductions and Transformations. Methods for 
the Dominant Eigenvalue. Methods for the Subdominant Eigenvalue. Inverse It- 
eration. Jacobi's Methods. Givens and Householder’s Methods. Eigensystem of 
a Symmetric Tridiagonal Matrix. The LR and QR Algorithms. Extensions of Ja- 
cobi's Method. Extension of Givens’ and Householder's Methods. QR Algorithm for 
Hessenberg Matrices. generalized Eigenvalue Problems. Available Implementations. 


G.W. Stewart (1973). Introduction io Matrir Computations, Academic 
Press, New York. 
Preliminaries. Practicalities. The Direct Solution of Linear Systems. Norma, Lim- 
its, and Condition Numbers, The Linear Least Squares Problem. Eigenvalues and 
Figenvectors. The QR Algorithm. 


R.J. Goult, R.F. Hoskins, J.A. Milner and M.J. Pratt (1974). Computa- 
tional Methods in Linear Algebra, John Wiley and Sons, New York. 
Eigenvalues and Eigenvectors. Error Analysis. The Solution of Linear Equations by 
Elimination and Decomposition Methods. The Sointion of Linear Systems of Equa- 
tions by Iterative Methods. Errors in the Solution Seta of Equations. Computation 
of Eigenvalues and Eigenvectors. Errors in Eigenvalues and Eigenvectors. Appendix 
— A Survey of Essential Resulta from Linear Algebra. 


T.F. Coleman and C.F. Van Loan (1988). Handbook for Matriz Computa- 
tions, SLAM Publications, Philadelphia, PA. 


Fortran 77, The Basic Linear Algebra Subprograms, Linpack, MATLAB. 


W.W. Hager (1988). Applied Numerical Linear Algebra, Prentice-Hall, En- 
glewood Cliffs, NJ. 
introduction. Elimination Schemes. Conditioning. Nonlinear Systems. Least 
Squares, Eigenproblems. Iterative Methods. 


P.G. Ciarlet (1989). Introduction to Numerical Linear Algebra and Opti- 
misation, Cambridge University Press. 
A Summary of Results on Matrices. General Results in the Numerical Analysis of 
Matrices. Sources of Problems in the Numerical Analysis of Matrices. Direct Meth- 
ods for the Solution of Linear Systems. Iterative Methods for the Solution of Linear 
Systems. Methods for the Calculation of Eigenvalues and Eigenvectors. A Review of 
Differential Calculus. Some Applications. General Results on Optimization. Some 
Algorithms. Introduction to Nonlineer Programming. Linear Programming. 


D.S. Watkins (1991). Fundamentals of Matriz Computations, John Wiley 
and Sons, New York. 
Gaussian Elimination and Its Variants. Sensitivity of Linear Systems; Effects of 
Roundoff Errors. Orthogonal Matrices and the Least-Squares Problem. Eigenvalues 


and Eigenvectora I. Eigenvalues and Eigenvectors IL. Other Methods for the Sym- 
metric Bigenvalue Problem. The Singular Value Decomposition. 


xviii REFERENCES 


P. Gill, W. Murray, and M.H. Wright (1991). Numerical Linear Algebra 
and Optimization, Vol. 1, Addison-Wesley, Reading, MA. 
Introduction. Linear Algebra Background. Computation and Condition. Linear 
Equations. Compatible Systems. Linear Least Squares. Linear Constraints I: Linear 
Programming. The Simplex Method. 


A. Jennings and J.J. McKeowen (1992). Matrir Computation (2nd ed), 
John Wiley and Sons, New York. 


Basic Algebraic and Numerical Concepts. Some Matrix Problems. Computer Imple- 
mentation. Elimination Methods for Linear Equations. Sparse Matrix Elimination. 
Some Matrix Eigenvalue Problems. Transformation Methods for Eigenvelue Prob- 
lems. Sturm Sequence Methods. Vector Iterative Methods for Partial Eigensolution. 
Orthogonalization and Re-Solution Techniques for Linear Equations. Iterative Meth- 
oda for Linear Equations. Non-linear Equations. Parallel and Vector Computing. 


B.N. Datta (1995). Numerical Linear Algebra and Applications. Brooks/Cole 

Publishing Company, Pacific Grove, California. 

Review of Required Linear Algebra Concepts. Floating Point Numbers and Errors in 
Computations. Stability of Algorithms and Conditioning of Problems. Numerically 
Effective Algorithms and Mathematical Software. Some Useful Transformations in 
Numerical Linear Algebra and Their Applications. Numerical Matrix Elgenvalue 
Problems. The Generalized Eigenvalue Problem. The Singular Value Decomposition. 
A Taste of Roundoff Error Analysis. 


M.T. Heath (1997). Scientific Computing: An Introductory Survey, McGraw- 
Hill, New York. 


Scientific Computing. Systems of Linear Equations. Linear Least Squares. Eigen- 
values and Singular Values. Nontinear Equations. Optimization. Interpolation. Nu- 
merical Integration and Differentiation. Initial Value Problems for ODEs. Boundary 
Value Problems for ODEs. Partial Differentia] Equations. Fast Fourier Transform. 
Random Numbers and Simulation. 


C.F. Van Loan (1997). Introduction to Scientific Computing: A Matriz- 
Vector Approach Using Matlab, Prentice Hall, Upper Saddle River, NJ. 


Power Tools of the Trace. Polynomial Interpolation. Piecewise Polynomial Interpo- 
lation. Numerical Integration. Matrix Computations. Linear Systema. The QR and 
Cholesky Factorizations. Nonlinear Equations and Optimisation. The Initial Value 
Problem. 


Advanced (General) 


N.J. Higham (1996). Accuracy and Stability of Numerical Algorithms, 
SLAM Publications, Philadelphia, PA. 


Principles of Finite Precision Computation. Floating Point Arithmetic. Basics. 
Summation. Potynomiaia. Norms. Perturbation Theory for Linear Systeme. Tri- 
angular Systems. LU Factorization and Linear Equations. Cholesky Factorization. 
Iterative Refinement. Block LU Factorization. Matrix Inversion. Condition Number 
Estimation. The Sylvester Equation. Stationary Iterative Methods. Matrix Powers. 
QR Factorization. The Least Squares Problem. Underdetermined Systema. Van- 
dermonde Systems. Fast Matrix Multiplication. The Fast Fourier Transform and 
Applications. Automatic Exror Analysis. Software Issues in Floating Point Arith- 
metic. A Gallery of Test Matrices. 


SELECTED REFERENCES XIX 


J.W. Demmel (1996). Numerical Linear Algebra, SIAM Publications, Philadel- 
phia, PA. 
Introduction. Linear Equation Solving. Linear Least Squares Problems. Nonsym- 
metric Eigenvalue Problema The Symmetric Eigenproblem and Singular Value De- 
composition. Iterative Methods for Linear Systems and Eigenvalue Problema. Iter- 
ative Algorithms for Eigenvalue Problems. 


L.N. Trefethen and D. Bau III (1997). Numerical Linear Algebra, SLAM 
Publications, Philadelphia, PA. 


Matrix-Vector Multiplication. Orthogonal Vectora and Matrices. Norms. The Sin- 
gular Value Decomposition. More on the SVD. Projectors. QR Factorization. Gram- 
Schmidt Orthogonalization. MATLAB. Householder Triangularization. Least-Squares 
Problems. Conditioning and Condition Numbers. Floating Point Arithmetic. Stabil- 
ity. More on Stability. Stability of Householder Triangularization. Stability of Back 
Substitution, Conditioning of Least-Squares Problems. Stability of Least-Squares 
Algorithms. Gaussian Elimination. Pivoting. Stability of Gaussian Elimination. 
Cholesky Factorization. Eigenvalue Problems. Overview of Eigenvalue Algorithms. 
Reduction to Heasenberg/Tridiagens! Form. Rayleigh Quotient, Inverse Iteration. 
QR Algorithm Without Shifts. OR Algorithm With Shifts. Other Eigenvalue Al- 
gorithms. Computing the SVD. Overview of Iterative Methods. The Arnoldi Itera- 
tion. How Arnoldi Locates Eigenvalues. GMRES. The Lanczos Iteration. Orthogo- 
nal Polynomials and Gauss Quadrature. Conjugate Gradients. Biorthogonalization 
Methods. Preconditioning. The Definition of Numerical Analysis. 


Analytical 


F.R. Gantmacher (1959). The Theory of Matrices Vol. 1, Chelsea, New 
York. 
Matrices and Operations on Matrices. The Algorithm of Gauss and Some of its 
Applications. Linear Operators in an n-dimensional Vector Space. The Character- 
istic Polynomial and the Minimum Polynomial of a Matrix. Functions of Matrices, 
Equivalent Transformations of Polynomial Matrices, Analytic Theory of Elementary 
Divisors. The Structure of a Linear Operator in an n-dimensional Space. Matrix 
Equations. Linear Operators in a Unitary Space. Quadratic and Hermitian Forma. 


F.R. Gantmacher (1959). The Theory of Matrices Vol 2, Chelsea, New 
York. 
Complex Symmetric, Skew-Symmetric, and Orthogonal Matrices. Singular Pencils 
of Matrices. Matrices with Nonnegative Elemente. Application of the Theory of Ma- 
trices to the Investigation of Systema of Linear Differential Equations. The Problem 
of Routh-Hurwitz and Related Questions. 


A. Berman and R.J. Plemmons (1979). Nonnegative Matrices in the Math- 
ematical Sciences, Academic Press, New York. Reprinted with additions 
in 1994 by SIAM Publications, Philadelphia, PA. 

Matrices Which Leave a Cone Invariant. Nonnegative Matrices. Semigroups of Non- 
negative Matrices. Symmetric Nonnegative Matrices. Generalized Inverse-Poaitivity. 
M-Matrices. Iterative Methods for Linear Systems, Finite Markov Chains. Input- 
Output Analysis in Economics. The Linear Complementarity Problem. 


xx REFERENCES 


G.W. Stewart and J. Sun (1990). Matriz Perturbation Theory, Academic 
Press, San Diego. 


Preliminaries. Norma sod Metrics. Linear Systems and Least Squares Problems. The 
Perturbation of Eigenvalues. Invariant Subspaces. Generalized Eigenvalue Problema. 


R. Horn and C. Johnson (1985). Matriz Analysis, Cambridge University 
Press, New York. 


Review and Miscellanea. Eigenvalues, Eigenvectors, and Similarity. Unitary Equiv- 
alence and Normal Matrices. Canonical Forms, Hermitian and Symmetric Matrices. 
Norms for Vectora and Matrices. Location and Perturbation of Eigenvalues. Positive 
Definite Matrices. 


R. Horn and C. Johnson (1991). Topics in Matriz Analysis, Cambridge 
University Press, New York. 


The Field of Values. Stable Matrices and Inertia, Singular Value Inequalities. Me- 
trix Equations and the Kronecker Product. The Hadamard Product. Matrices and 
Functions, 


Linear Equation Problems 


D.M. Young (1971). Iterative Solution of Large Linear Systems, Academic 
Press, New York. 


Introduction. Matrix Preliminaries. Linear Stationary Iterative Methods. Conver- 
gence of the Basic Iterative Methods Eigenvalues of the SOR Method for Com- 
aistently Ordered Matrices. Determination of the Optimum Relaxation Parameter. 
Norms of the SOR Method. The Modified SOR Method: Fixed Parameters. Nonsta- 
tionary Linear Iterative Methods. The Modified SOR Method: Variable Parameters. 
Semi-iterative Methods. Extensions of the SOR Theory; Stieltjes Matrices. Gener- 
alized Consistently Ordered Matrices. Group Iterative Methods. Symmetric SOR 
Method and Related Methods. Second Degree Methods. Alternating Direction Im- 
plicit Methods. Selection of an Iterative Method. 


L.A. Hageman and D.M. Young (1981). Applied Iterative Methods, Aca- 
demic Press, New York. 


Background on Linear Algebra and Related Topics. Background on Basic Iterative 
Methods. Polynomial Acceleration. Chebyshev Acceleration. An Adaptive Cheby- 
shev Procedure Using Special Norms. Adaptive Chebyshev Acceleration. Conjugate 
Gradient Acceleration. Special Methods for Red/Black Partitionings. Adaptive Pro- 
cedures for Successive Overrelaxation Method. The Use of Iterative Methods in the 
Solution of Partial Differential Equations, Case Studies. The Nonsymmetrizable 
Cane. 


REFERENCES xxi 


A. George and J. W-H. Liu (1981). Computer Solution of Large Sparse 
Positive Definite Systems. Prentice-Hall Inc., Englewood Cliffs, New 
Jersey. 

Introduction. Fundamentals. Some Graph Theory Notation and Its Use in the 
Study of Sparse Symmetric Matrices. BAnd and Envelope Methods. General Sparse 
Methods. Quotient Tree Methods for Finite Element and Finite Difference Prob- 


lems. One-Way Dissection Methods for Finite Element Problams. Nested Dissection 
Methods, Numerical Experiments. 


S. Pissanetsky (1984). Sparse Matrir Technology, Academic Press, New 
York. 


Fundamentals. Linear Algebraic Equations. Numerical Errors in Gaussian Elimi- 
nation. Ordering for Gauss Elimination: Symmetric Matrices. Ordering for Gausa 
Elimination: General Matrices. Sparse Eigengnalysis. Sparse Matrix Algebra. Con- 
nectivity and Nodal Assembly. General Purpose Algorithma. 


L.S. Duff, A.M. Erisman, and J.K. Reid (1986). Direct Methods for Sparse 
Matrices, Oxford University Press, New York. 


Introduction. Sparse Matrices:Storage Schemes and Simple Operations. Gaussian 
Elimination for Dense Matrices: The Algebraic Problem Gaussian Elimination 
for Dense Matrices: Numerical Considerations. Gaussian Elimination for Sparse 
Matrices: An Introduction. Reduction to Block Triangular Form. Local Pivotal 
Strategies for Sparse Matrices. Ordering Sparse Matrices to Special Forms. !m- 
plementing Gaussian Elimination: Analyse with Numerical Values. Implementing 
Gaussian Elimination with Symbolic Analyse. Partitioning, Matrix Modification, 
and Tearing. Other Sparsity-Oriented Issues. 


R. Barrett, M. Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra, V. 
Eijkhout, R. Pozo, C. Romine, H. van der Vorst (1993). Templates for 
the Solution of Linear Systems: Building Blocks for Iterative Methods, 
SIAM Publications, Philadelphia, PA. 


Introduction. Why Use Templates? What Methods are Covered? Iterative Methods. 
Stationary Methods. Nonstationary Iterative Methods. Survey of Recent Krylov 
Methods. Jacobi, Incomplete, SSOR, and Polynomial Preconditioners. Complex 
Systems. Stopping Criteria. Data Structures. Parallelism. The Lanczos Connection. 
Block Iterative Methods. Reduced System Preconditioning. Domain Decomposition 
Methods. Multigrid Methods. How Projection Methods. 


W. Hackbusch (1994). Iterative Solution of Large Sparse Systems of Equa- 
tions, Springer-Verlag, New York. 


Introduction, Recapitulation of Linear Algebra. Iterative Methods. Methods of 
Jacobi and Gauss-Seidel and SOR, Iteration in the Positive Definite Case. Analysis 
in the 2-Cyclic Case, Analysis for M-Matrices. Semi-Iterative Methods. Transfor- 
mations, Secondary Iterations, Incompiete Triangular Decompositions. Conjugate 
Gradient Methods. Multi-Grid Methods. Domain Decomposition Methods. 


xxii REFERENCES 


O. Axelsson (1994). Iterative Solution Methods, Cambridge University 
Press. 


Direct Solution Methods. Theory of Matrix Eigenvaloss. Positive Definite Matri- 
ces, Schur Complements, and Generalized Eigenvalue Probema. Reducible and Itre- 
dncible Matrices and tbe Parron-Frobenious Theory for Nonnegative Matrices. Basic 
Iterative Methods sod Their Rates of Convergence. M-Matrices, Convergent Split- 
tings, and the SOR Method. Incomplete Factorization Preconditioning Methods. 
Apprmamate Matrix Inverses and Corresponding Preconditioning Methods. Block 
Diagonal and Schur Complement Preconditionngs. Estimates of Eigenvalues and 
Condition Numbers for Preconditioned Matrices. Conjugate Gradient and Lanczos- 
Type Methods. Generalised Conjugate Gradient Methods. The Rate of Convergence 
of the Conjugate Gradient Method. 


Y. Saad (1996). Iterative Methods for Sparse Linear Systems, PWS Pub- 
lishing Co., Boston. 


Background in Linear Algebra. Discretization of PDEs. Sparse Matrices. Basic 
Iterative Methods. Projection Methods. Krylov Subspace Methods - Part I. Krylov 
Subspace Methods - Part I]. Methods Related to the Normal Equations. Precon- 
ditioned Iterations. Preconditioning Techniques. Parallel Implementations. Parallel 
Preconditioners. Domain Decompasition Methods. 


Linear Fitting Problems 


C.L. Lawson and RJ. Hanson (1974). Solving Least Squares Problems, 
Prentice-Hall, Englewood Cliffs, NJ. Reprinted with a detailed "new 
developments" appendix in 1996 by STAM Publications, Philadelphia, 
PA. 


Introduction. Analysis of the Least Squares Problem. Orthogonal Decomposition by 
Certain Elementary Transformations. Orthogonal Decomposition by Singular Value 
Decomposition. Perturbation Theorems for Singular Values. Bounds for the Con- 
dition Number of a Triangular Matrix. The Pseudcinverse. Perturbation Bounds 
for the Pseudoinverse. Perturbation Bounds for the Solution of Problem LS. Nu- 
merical Computations Using Elementary Orthogonal Transformations, Computing 
the Solution for the Overdetermined or Exactly Determined Full Rank Problem. 
Computation of the Covariance Matrix of the Solution Parameters. Computing the 
Solution for the Underdetermined Full Rank Problem Computing the Solution for 
Problem LS with Poambly Deficient Pseudorank. Analysis of Computing Errors for 
Householder Transformations. Analysis of Computing Errors for the Problem LS. 
Analysis of Computing Errors for the Problam LS Uzing Mixed Precision Arithmetic. 
Computation of the Singular Value Decomposition and the Solution of Problem 
LS. Other Methods for Least Squares Problems Linear Least Squares with Linear 
Equality Constraints Using a Bass of the Null Space. Linear Least Squares with 
Linear Equality Constraints by Direct Elimination. Linear Least Squares with Liu- 
ear Equality Constraints by Weighting. Linear least Squares with Linear Inequality 
Constraints. Modifying a QR Decomposition to Add or Remove Column Vectors. 
Practical Analysis of Least Squares Problems, Examples of Some Methods of Ana- 
lyzing a Lesat Squares Problem. Modifying a QR Decomposition to Add or Remove 
Row Vectors with Application to Sequential Processing of Problems Having Large 
or Banded Coefficient Matrix. 


REFERENCES xxiii 


RW. Farebrother (1987). Linear Least Squares Computations, Marcel 
Dekker, New York. 


The Gauss and Gause—Jordan Methods. Matrix Analysis of Gauss’s Method: The 
Choleaky and Doolittle Decompositions. The Linear Algebraic Model: The Method 
of Averages and the Method of Least Squares. The Cauchy-Bienayme, Laplace, 
and Schmidt Procedures. Householder Procedures. Givens Procedures. Updating 
the QU Decomposition. Pseudcrandom Numbers. The Standard Linear Model. 
Condition Numbers. Instrumental Variable Estimators. Generalized Least Squares 
Estimation. Iterative Solntions of Linear and Nonlinear Least Squares Problems. 
Canonical Expressions for the Least Squares Estimators and Test Statistics. Tra- 
ditional Expressions for the Least Squares Updating Formulas and test Statistics. 
Least Squares Estimation Subject to Linear Constraints. 


S. Van Huffel! and J. Vandewalle (1991). The Total Least Squares Problem: 
Computational Aspects and Analysis, SIAM Publications, Philadelphia, 
PA. 

Introduction. Basic Principles of the Total Least Squares Problem. Extensions of the 
Basic Total Least Squares Problem. Direct Speed Improvement of the Total Least 
Squares Computations. Iterative Speed Improvement for Solving Slowly Varying 
Total Least Squares Problems. Algebraic Connections Between Total Least Sqnares 
and Least Squares Problems. Sensitivity Analysis of Total Least Squares and Least 
Squares Problems m the Presence of Errors in All Data. Statistical Properties of the 
Total Least Squares Problem. Algebraic Connections Between Total Least Squares 
Estimation and Classical Linear Regression in Multicollinearity Problems. Conclu- 
sions. 


A. Bjórck (1996). Numerical Methods for Least Squares Problems, SLAM 
Publications, Philadelphia, PA. 
Mathematical and Statistical Properties of Least Squares Solutions. Basic Numerical 
Methods. Modified Leest Squares Problems. Generalized Least Squares Problems. 
Constrained Least Squares Problems. Direct Methods for Sparse Least Squares Prob- 
lems. Iterative Methods for Least Squares Problems. Least Squares with Special 
Bases. Nonlinear Least Squares Problems. 


Eigenvalue Problems 


B.N. Pariett (1980). The Symmetric Eigenvalue Problem, Prentice-Hall, 
Englewood Cliffs, NJ. 
Basic Facts about SelfAdjoint Matrices. 'Thaks, Obstacles, and Aide Counting 
Eigenvalues. Simple Vector Iterations. Defiation. Useful Orthogonal Matrices. 
Tridiagonal Form. The QL and QR Algorithms. Jmcobi Methods. Eigenvalue 
Bounds. Approximation from a Subspace. Krylov Subspaces. Lanczos Algorithms. 
Subspace Iteration. The General Linear Eigenvalne Problem. 


J. Cullum and R.A. Willoughby (1985a). Lanczos Algorithms for Large 
Symmetric Eigenvalue Computations, Vol. I Theory, Birkhaüser, Boston. 
Prelimimaries: Notation and Definitions. Real Symmetric Problems. Lancsos Pro- 


Defective Complex Symmetric Matrices. Block Lancsos Procedures, Real Symmetric 
Matrices. 


xxiv REFERENCES 


J. Cullum and R.A. Willoughby (1985b). Lanezos Algorithms for Large 
Symmetric Eigenvalue Computations, Vol. II Programs, Birkhaüser, 
Boston. 


Lanczos Procedures. Real Symmetric Matrices. Hermitian Matrices. Factored In- 
verses of Real Symmetric Matrices. Real Symmetric Generalized Problems. Real 
Rectangular Problems. Nondefective Complex Symmetric Matrices. Real Symmet- 
ric Matrices, Block Lanczos Code. Factored Inverses, Real Symmetric Matrices, 
Block Lanczos Coda. 


Y. Saad (1992). Numerical Methods for Large Eigenvalue Problems: Theory 
and Algorithms, John Wiley and Sons, New York. 


Background in Matrix Theory and Linear Algebra. Perturbation Theory and Er- 
ror Analysis. The Tools of Spectral Approximation. Subepace Iteration. Krylov 
Subspace Methods. Acceleration Techniques and Hybrid Methods. Precondition- 
ing Techniques. Non-Standard Eigenvalue Problems. Origins of Matrix Eigenvalue 
Problems. 


F. Chatelin (1993). Eigenvalues of Matrices, John Wiley and Sons, New 
York. 


Supplements from Linear Algebra. Elements of Spectral Theory. Why Compute 
Eigenvalues. Error Analysis. Foundations of Methoda for Computing Eigenvalues. 
Numerical Methods for Large Matrices. Chebyshey's Iterative Methods. 


High Performance 


W. Schónauer (1987). Scientific Computing on Vector Computers, North 
Holland, Amsterdam. 


Introduction. The First Commercially Significant Vector Computer. The Arithmetic 
Performance of the First Commercially Significant Vector Computer. Hockney’s n}/4 
and Timing Formulae. Fortran and Autovectorization. Behavior of Programs. Some 
Basic Algorithms, Recurrences. Matrix Operations. Systems of Linear Equations 
with Full Matrices. Tridiagonal Linear Systems. The Iterative Solution of Linear 
Equations. Special Applications. The Fujitsu VPs and Other Japanese Vector Com- 
puters. The Cray-2. The IBM VF and Other Vector Processors. The Convex C1. 


R.W. Hockney and C.R. Jesshope (1988). Parallel Computers 2, Adam 
Hilger, Bristol and Philadelphia. 


Introduction. Pipelined Computers. Processor Arrays. Parallel Languages. Parallel 
Algorithms. Future Developments. 


J.J. Modi (1988). Parallel Algorithms and Matriz Computation, Oxford Uni- 
versity Press, Oxford. 


General Principles of Parallel Computing. Parallel Techniques and Algorithms. Par- 
allel Sorting Algorithms. Solution of a System of Linear Algebraic Equations. The 
Symmetric Eigenvalue Problem: Jacobi's Method. QR Factorization. Singular Value 
Decomposition and Related Problema. 


SELECTED REFERENCES xxv 


J. Ortega (1988). Introduction to Parallel and Vector Solution of Linear 
Systems, Plenum Press, New York. 


Introduction. Direct Methods for Linear Equations. Iterative Methods for Linear 
Equations. 


J. Dongarra, I. Duff, D. Sorensen, and H. van der Vorst (1990). Solving 

Linear Systems on Vector and Shared Memory Computers, SLAM Pub- 
lications, Philadelphia, PA. 
Vector and Parallel Processing. Overview of Current High-Performance Comput- 
ers. Implementation Detaile and Overhead. Performance Analysis, Modeling, and 
Measurements. Building Blocksin Linear Algebra. Direct Solution of Sparse Linear 
Systems. Iterative Solution of Sparse Linear Systems. 


Y. Robert (1990). The Impact of Vector and Parullel Architectures on the 
Gaussian Elimination Algorithm, Halsted Press, New York. 
Introduction. Vector and Parallel Architectures. Vector Multiprocessor Computing. 


Hypercube Computing. Systolic Computing. Task Graph Scheduling. Analysis of 
Distribnted Algorithms. Design Methodologies. 


G.H. Golub and J.M. Ortega (1993). Scientific Computing: An Introduc- 
tion with Parallel Computing, Academic Press, Boston. 
The World of Scientific Computing. Linear Algebra. Parailel and Vector Computing. 
Polynomial Approximation. Continuous Problems Solved Discretely. Direct Solu- 


tion of Linear Equations. Parallel Direct Methods. Iterative Methods. Conjugate 
Gradient-Type Methods. 


Edited Volumes 


D.J. Rose and R. A. Willoughby, eds. (1972). Sparse Matrices and Their 
Applications, Plenum Press, New York, 1972 


J.R. Bunch and D.J. Rose, eds. (1976). Sparse Matriz Computations, 
Academic Press, New York. 


LS. Duff and G.W. Stewart, eds. (1979). Sparse Matriz Proceedings, 1978, 
SIAM Publications, Philadelphia, PA. 


LS. Duff, ed. (1981). Sparse Matrices and Their Uses, Academic Press, 
New York. 


A. Björck, RJ. Plemmons, and H. Schneider, eds. (1981). Large-Scale 
Matriz Problems, North-Holland, New York. 


G. Rodrigue, ed. (1982). Parallel Computation, Academic Press, New 
York. 


xxvi REFERENCES 


B. Kágstróm and A. Ruhe, eds. (1983). Matrix Pencils, Proc. Pite Havs- 
bad, 1982, Lecture Notes in Mathematics 973, Springer-Verlag, New 
York and Berlin. 


J. Cullum and R.A. Willoughby, eds. (1986). Large Scale Eigenvalue Prob- 
lems, North-Holland, Amsterdam. 


A. Wouk, ed. (1986). New Computing Environments: Parallel, Vector, and 
Systalic, SIAM Publications, Philadelphia, PA. 


M.T. Heath, ed. (1986). Proceedings of First SIAM Conference on Hyper- 
cube Multiprocessors, SLAM Publications, Philadelphia, PA. 


M.T. Heath, ed. (1987). Hypercube Multiprocessors, SLAM Publications, 
Philadelphia, PA. 


G. Fox, ed. (1988). The Third Conference on Hypercube Concurrent Com- 
puters and Applications, Vol. Il — Applications, ACM Press, New York. 


M.H. Schultz, ed. (1988). Numerical Algorithms for Modern Parallel Com- 
puter Architectures, IMA Volumes in Mathematics and Its Applications, 
Number 13, Springer-Verlag, Berlin. 


E.F. Deprettere, ed. (1988). SVD and Signal Processing. Elsevier, Ams- 
terdam. 


B.N. Datta, C.R. Johnson. M.A. Kaashoek, R. Plemmons, and E.D. Son- 
tag, eds. (1988), Linear Algebra in Signals, Systems, and Control, SITAM 
Publications, Philadelphia, PA. 


J. Dongarra, I. Duff, P. Gaffney, and S. McKee, eds. (1989), Vector and 
Parullel Computing, Ellis Horwood, Chichester, England. 


O. Axelsson, ed. (1989). “Preconditioned Conjugate Gradient Methods,” 
BIT 29:4. 


K. Gallivan, M. Heath, E. Ng, J. Ortega, B. Peyton, R. Plemmons, C. 
Romine, A. Sameh, and B. Voigt (1990), Parallel Algorithms for Matriz 
Computations, SIAM Publications, Philadelphia, PA. 


G.H. Golub and P. Van Dooren, eds. (1991). Numerical Linear Alge- 
bra, Digital Signal Processing, and Parallel Algorithms. Springer-Verlag, 
Berlin. 


R. Vaccaro, ed. (1991). SVD and Signal Processing II: Algorithms, Analy- 
sis, and Applications. Elsevier, Amsterdam. 


REFERENCES xxvii 


R. Beauwens and P. de Groen, eds. (1992). Iterative Methods in Linear 
Algebra, Elsevier (North-Holland), Amsterdam. 


R.J. Plemmons and C.D. Meyer, eds. (1993). Linear Algebra, Markov 
Chains, and Queuing Models, Springer-Verlag, New York. 


M.S. Moonen, G.H. Golub, and B.L.R. de Moor, eds. (1993) Linear 
Algebra for Large Scale and Real- Time Applications, Kluwer, Dordrecht, 
The Netherlands. 


J.D. Brown, M.T. Chu, D.C. Ellison, and R.J. Plemmons, eds, (1994). Pro- 
ceedings of the Cornelius Lanczos International Centenary Conference, 
SLAM Publications, Philadelphia, PA. 


R.V. Patel, A.J. Laub, and P.M. Van Dooren, eds. (1994). Numerical 
Linear Algebra Techniques for Systems and Control, IEEE Press, Pis- 
cataway, New Jersey. 


J. Lewis, ed. (1994). Proceedings of the Fifth SIAM Conference on Applied 
Linear Algebra, SLAM Publications, Philadelphia, PA. 


A. Bojanczyk and G. Cybenko, eds. (1995). Linear Algebra for Signal 
Processing, IMA Volumes in Mathematics and Its Applications, Springer- 
Verlag, New York. 


M. Moonen and B. De Moor, eds. (1995). SVD and Signal Processing HI: 
Algorithms, Analysis, and Applications, Elsevier, Amsterdam. 


Matrix Computations 


Chapter 1 


Matrix Multiplication 
Problems 


$1.1 Basic Algorithms and Notation 

$1.2 Exploiting Structure 

$1.3 Block Matrices and Algorithms 

$1.4 Vectorization and Re-Use Issues 


The proper study of matrix computations begins with the study of the 
matrix-matrix multiplication problem. Although this problem is simple 
mathematically it is very rich from the computational point of view. We 
begin in §1.1 by looking at the several ways that the matrix multiplica- 
tion problem can be organized. The “language” of partitioned matrices 
is established and used to characterize several linear algebraic “levels” of 
computation. 

If a matrix has structure, then it is usually possible to exploit it. For 
example, a symmetric matrix can be stored in half the space as a general 
matrix. A matrix-vector product that involves a matrix with many zero 
entries may require much less time to execute than a full matrix times a 
vector. These matters are discussed in §1.2. 

In §1.3 block matrix notation is established. A block matrix is a matrix 
with matrix entries. This concept is very important from the standpoint of 
both theory and practice. On the theoretical side, block matrix notation 
allows us to prove important matrix factorizations very succinctly. These 
factorizations are the cornerstone of numerical linear algebra. From the 
computational point of view, block algorithms are important because they 
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are rich in matrix multiplication, the operation of choice for many new high 
performance computer architectures. 

These new architectures require the algorithm designer to pay as much 
attention to memory traffic as to the actual amount of arithmetic. This 
aspect of scientific computation is illustrated in §1.4 where the critical is- 
sues of vector pipeline computing are discussed: stride, vector length, the 
number of vector loads and stores, and the level of vector re-use. 


Before You Begin 


It is important to be familiar with the MATLAB language. See the 
texts by Pratap(1995} and Van Loan (1996). A richer introduction to high 
performance matrix computations is given in Dongarra, Duff, Sorensen, and 
Duff (1991). This chapter's LAPACK connections include 


LAPACK: Some General Operations 


re oF 

p= zTy 

y- arty 

yr adr + By Matrix-vector multiplication 
A~A+aryT | Rank-1 update 

C — aAB - 9C | Matrix multiplication 


ym GÀx + Dy Matrix-vector multiplication 
y — a Ar + gy Matrix-vector multiplication (Packed) 


A — azzT +A Rank-1 update 

A — azyT + ayr +A Rank-2 update 

C — aAAT + BC Rank-k update 

C — aABT + aBAT - 8C Rank-2k update 
Symmetric/General Product 


B — xAB (or BA) 


1.1 Basic Algorithms and Notation 


Matrix computations are built upon a hierarchy of linear algebraic opera- 
tions. Dot products involve the scalar operations of addition and multipli- 
cation. Matrix-vector multiplication is made up of dot products. Matrix- 
matrix multiplication amounts to a collection of matrix-vector products. 
All of these operations can be described in algorithmic form or in the lan- 
guage of linear algebra. Our primary objective in this section is to show 
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how these two styles of expression complement eacb another. Along the way 
we pick up notation and acquaint the reader with the kind of thinking that 
underpins the matrix computation area. The discussion revolves around 
the matrix multiplication problem, a computation that can be organized in 
several ways. 


1.1.1 Matrix Notation 


Let IR. denote the set of real numbers. We denote the vector space of all 
m-by-n real matrices by R™*": 


aii ^^* Gin 
Aem" <> Ac-(a)-]| : a; € R. 
mi ^oc Umn 
If a capital letter is used to denote a matrix (e.g. A, B, A), then the 
corresponding lower case letter with subscript ij refers to the (3,7) entry 


(e.g., aij , bj, 6:3). As appropriate, we also use the notation { A ];; and 
A(i, 7) to designate the matrix elements. 


1.1.2 Matrix Operations 
Basic matrix operations include transposition (R™*" — R"*™), 


C= AT ==> Cj = O55, 


addition (R™*" x R™*" | R™**), 

C=A+8B. = Cij = ayy + bijs 
scalar-matrir multiplication, (R x R™*" — R™**), 

C=aA => Cij = Qij, 
and matriz-matrir multiplication (R™*? x RP*" — R™**), 


C= AB => C5 = salu. 
kml 


These are the building blocks of matrix computations. 
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1.1.3 Vector Notation 
Let R” denote the vector space of real n-vectors: 
Tı 
rem > z=] : neEeR. 
Tn 
We refer to zr; as the ith component of r. Depending upon context, the 
alternative notations [z]; and x(i) are sometimes used. 
Notice that we are identifying IR" with R^*! and so the members of 


IR^ are column vectors. On the other hand, the elements of IR'*" are row 


vectors: 
xeB^ 4 zr-(xr,..mQ). 


If z is a column vector, then y = r^ is a row vector. 


1.1.4 "Vector Operations 


Assume a € R,z c R”, and y € R”. Basic vector operations include scalar- 
vector multiplication, 


z=>ar = ži = GI, 


vector addition, 


H 


z2rty = ži = X; + Yis 


the dot product (or inner product), 


n 
city = eS eat 
iml 


and vector multiply (or the Hadamard product) 


Z-m*gy — Aoc Ig. 


Another very important operation which we write in “update form" is the 
sazpy: 

y — ar-c-y = yi-—aüri-yi 
Here, the symbol “=” is being used to denote assignment, not mathematical 
equality. The vector y is being updated. The name "saxpy" is used in 
LAPACK, a software package that implements many of the algorithms in 
this book. One can think of ^saxpy" as a mnemonic for "scalar a r plus 


p 


y. 


1.1. BASIC ALGORITHMS AND NOTATION 5 


1.1.5 The Computation of Dot Products and Saxpys 


We have chosen to express algorithms in a stylized version of the MATLAB 
language. MATLAB is a powerful interactive system that is ideal for matrix 
computation work. We gradually introduce our stylized MATLAB notation 
in this chapter beginning with an algorithm for computing dot products. 


Algorithm 1.1.1 (Dot Product) Ifz,y € R”, then this algorithm com- 
putes their dot product c = zT y. 
C — t) 
for 1 = lm 
c= c+ zx(i)y(i) 
end 


The dot product of two n-vectors involves n multiplications and n additions. 
It is an "O(n)" operation, meaning that the amount of work is linear in 
the dimension. The saxpy computation is also an O(n) operation, but it 
returns a vector instead of a scalar. 


Algorithm 1.1.2 (Saxpy) If z,y € IR" and a € R, then this algorithm 
overwrites y with ax + y. 


for i = 1:n 
y(i) = ax(t) + y(i) 
end 


It must be stressed that the algorithms in this book are encapsulations of 
critical computational ideas.amd not “production codes.” 


1.1.6 Matrix-Vector Multiplication and the Gaxpy 
Suppose A € R™*" and that we wish to compute the update 
y= Arty 


where zx € R° and y € R” are given. This generalized saxpy operation is 
referred to as à gazpy. À standard way that this computation proceeds is 
to update the components one at a time: 


Wi = $ aut; TOM t= lm. 
j=l 


This gives the following algorithm. 


Algorithm 1.1.3 (Gaxpy: Row Version) If A € R™”*", z c IR^, and 
y € IR", then this algorithm overwrites y with Ar + y. 
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for i = l:m 
for j = 1:n 
y(i) = A(t, 3)x(3) + y(i) 
end 
end 


An alternative algorithm results if we regard Az as a linear combination of 
A's columns, e.g., 


1 2 7 1.742.8 1 2 23 
3 4 BE 3-7448 | =7/ 39 ])4+8/ 4] =] 53] . 
9 6 5-74+6-8 5 6 83 


Algorithm 1.1.4 (Gaxpy: Column Version) If A c R"*", ze R^, 
and y € IR", then this algorithm overwrites y with Az + y. 


for j = lm 
for i = lim 
yli) = A(t, 3)z(3) + yti) 
end 
end 


Note that the inner loop in either gaxpy algorithm carries out a saxpy 
operation. The column version was derived by rethinking what matrix- 
vector multiplication “means” at the vector level, but it could also have 
been obtained simply by interchanging the order of the loops in the row 
version. In matrix computations, it is important to relate loop interchanges 
to the underlying linear algebra. 


1.1.7 Partitioning a Matrix into Rows and Columns 


Algorithms 1.1.3 and 1.1.4 access the data in A by row and by column 
respectively. To highlight these orientations more clearly we introduce the 
language of partitioned matrices. 

From the row point of view, a matrix is a stack of row vectors: 


A g ik” => Az]: re eR”. (1.1.1) 


1.1. Basic ALGORITHMS AND NOTATION T 


then we are choosing to think of A as a collection of rows with 
rf —[1 2] rl -(3 4], and rT-[5 6]. 


With the row partitioning (1.1.1) Algorithm 1.1.3 can be expressed as fol- 
lows: 


for i = i:m 
y = r7 z + yli) 
end 


Alternatively, a matrix is a collection of column vectors: 
AcE""" — Aczc[o,..c], ER”. (1.1.2) 


We refer to this as a column partition of A. In the 3-by-2 example above, we 
thus would set c; and cz to be the first and second columns of A respectively: 


With (1.1.2) we see that Algorithm 1.1.4 is a saxpy procedure that accesses 
A by columns: 


for ; = lm 
yr; ty 
end 


In this context appreciate y as a running vector sum that undergoes re- 
peated saxpy updates. 


1.1.8 The Colon Notation 


A handy way to specify 8 column or row of a matrix is with the "colon" 
notation. If A € IR™*", then A({k, :) designates the kth row, i.e., 


A(k, :) = (ant, s} akn] " 
The kth column is specified by 


Gik 
A(:,k) = 
Amk 
With these conventions we can rewrite Algorithms 1.1.3 and 1.1.4 as 


for i = l:m 


y(i) = A(z, :)z + yli) 
end 
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and 


for j = lin 
y —-x(j)A(,3) +y 
end 


respectively. With the colon notation we are abie to suppress iteration 
details. This frees us to think at the vector level and focus on larger com- 
putational issues. 


1.1.9 The Outer Product Update 


As a preliminary application of the colon notation, we use it to understand 
the outer product update 


A-A-zryJ, AER™"* ceR™, yeR”. 


The outer product operation zy7 “looks funny" but is perfectly legal, e.g., 


1 4 5 
HI s=] 3 1 
3 12 18 


This is because ry” is the product of two "skinny" matrices and the number 
of columns in the left matrix x equals the number of rows in the right matrix 
yT. The entries in the outer product update are prescribed by 


for i= l:m 
for 7 = ln 
Qij = iy T Tiy 
end 
end 


The mission of the j loop is to add a multiple of y7 to the i-th row of A, 
Le., 


for i = l:m 
A(i,:) = A(i,:) + z(i)yT 
end 


On the other hand, if we make the i-loop the inner loop, then its task is to 
add a multiple of z to the jth column of A: 


for j = i: 
A(:,j) = 4,3) + OG) 
end 


Note that both outer product algorithms amount to a set of saxpy updates. 
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1.1.10  Matrix-Matrix Multiplication 


Consider the 2-by-2 matrix-matrix multiplication AB. In the dot product 
formulation each entry is computed as a dot product: 


1 2][5 6] [1:542.7 1-6+2-8 
341[78]|] | 3-54+4-7 3.6—-4-8]" 


In the saxpy version each column in the product is regarded as a linear 
combination of columns of A: 


a] e]- Ld] B]estti]- 


Finally, in the outer product version, the result is regarded as the sum of 
outer products: 


ls a] [s sf=(a ts s] |i |t al. 


Although equivalent mathematically, it turns out that these versions of 
matrix multiplication can have very different levels of performance because 
of their memory traffic properties. This matter is pursued in $1.4. For now, 
it is worth detailing the above three approaches to matrix multiplication 
because it gives us a chance to review notation and to practice thinking at 
different linear algebraic levels. 


1.1.11  Scalar-Level Specifications 
To fix the discussion we focus on the following matrix multiplication update: 


C=AB+C  AeR""B5em" ccm", 
The starting point is the familiar triply-nested loop algorithm: 
Algorithm 1.1.5 (Matrix Multiplication: ijk Variant) If A c IR"*?, 


Be IP*", and C € R™*" are given, then this algorithm overwrites C with 
AB +C. 


for t = 1: m 
for j = Ln 
for k = Lp 
C(i,j) = Ali, k)B(k, j) + C(,3) 
end 
end 


end 
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This is the “ijk variant” because we identify the rows of C (and A) with i, 
the columns of C (and B) with j, and the summation index with k. 

We consider the update C = AB + C instead of just C = AB for two 
reasons. We do not have to bother with C = 0 initializations and updates 
of the form C = AB + C arise more frequently in practice. 

The three loops in the matrix multiplication update can be arbitrarily 
ordered giving 3! = 6 variations. Thus, 


for j = l:n 
for k = l:p 
for i = i:m 
C(i j) = Ali, k) B(k, 7) + C(t, 5) 
end 
end 
end 


is the jki variant. Each of the six possibilities (ijk, jik, ikj, jki, ki, 
kji) features an inner loop operation (dot product or saxpy) and has its 
own pattern of data flow. For example, in the i7k variant, the inner loop 
oversees a dot product that requires access to a row of A and a column of 
B. The jki variant involves a saxpy that requires access to a column of C 
and a column of A. These attributes are summarized in Table 1.1.1 along 
with an interpretation of what is going on when the middie and inner loop 
are considered together. Each variant involves the same amount of floating 


Loop Inner Middie Inner Loop 
Order Loop Loop Data Access 
vector x matrix A by row, B by column 
matrix x vector A by row, B by column 
IOW gaxpy B by row, C by row 
column gaxpy A by column, C by column 
row outer product B by row, C by row 


column outer product | A by column, C by column 


TABLE 1.1.1. Matrir Multiplication: Loop Orderings and Properties 


point arithmetic, but accesses the A, B, and C data differently. 


1.1.12 A Dot Product Formulation 


The usual matrix multiplication procedure regards AB as an array of dot 
products to be computed one at a time in left-to-right, top-to-bottom order. 
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This is the idea behind Algorithm 1.1.5. Using the colon notation we can 
highlight this dot-product formulation: 


Algorithm 1.1.6 (Matrix Multiplication: Dot Product Version) 
If de R?*?, B c RP*", and C € R™*" are given, then this algorithm 
overwrites C with AB + C. 


for i = 1:m 


for j = i:n 
Cli, j) = Alt, :)B(:, 3) + CG, j) 
end 
end 
In the language of partitioned matrices, if 
af 
A= : a, € IR? 

am 


and 


BEI] b.c IR? 
then Algorithm 1.1.6 has this interpretation: 


for i = lim 
for j = iin 
e; = afb; + Gij 
end 
end 


Note that the “mission” of the j-loop is to compute the ith row of the 
update. To emphasize this we could write 


for i= i:m. 
q -a B+ 
end 


where 


is a row partitioning of C. To say the same thing with the colon notation 
we write 
for i = 1:m 
C(i,:) = A(4,:)B + C(i,:) 
en 
Either way we see that the inner two loops of the ijk variant define a 
row-oriented gaxpy operation. 
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1.1.13 A Saxpy Formulation 


Suppose A and C are column-partitioned as follows 
A = [a;...,ap] a; € R” 


C 


Gissi] c; € R”. 
By comparing jth columns in C = AB + C we see that 
p 
Cj — S| baie TO, j = lin, 
k=l 
These vector sums can be put together with a sequence of saxpy updates. 
Algorithm 1.1.7 (Matrix Multiplication: Saxpy Version) If the ma 


trices A € IR"**, B € IRP*", and C e R™*" are given, then this algorithm 
overwrites C with AB + C. 


for j = lm 
for k = Lp 
C(,3) = AG, K)B(k j) + CC j) 
end 
end 


Note that the k-loop oversees a gaxpy operation: 
for j = 1:n 


C( j) = AB(:, 7) + CO; 9) 
end 


1.1.14 An Outer Product Formulation 
Consider the kij variant of Algorithm 1.1.5: 


for k = Lp 
for j = Ln 
for į = l:m 
Cli, j) = AG, K) Bk j) + CG, j) 
end 
end 
end 


The inner two loops oversee the outer product update 


C -ayb[4C 
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where 
bf 
Az=(a,,...,¢,] and B= : (1.1.3) 
bp 


with a4 € R” and by € IR". We therefore obtain 


Algorithm 1.1.8 (Matrix Multiplication: Outer Product Version) 
If Ac R™”?, Be IRP*^, and C c R™™ are given, then this algorithm 
overwrites C with AB + C. 


for k = l:p 
C = A(:, k)B(k,:) +C 
end 


This implementation revolves around the fact that AB is the sum of p outer 
products. 


1.1.15 The Notion of “Level” 


The dot product and saxpy operations are examples of "level-1" operations. 
Level-1 operations involve an amount of data and an amount of arithmetic 
that is linear in the dimension of the operation. An m-by-n outer product 
update or gaxpy operation involves a quadratic amount of data (O(mn)) 
and a quadratic amount of work (O(mn)). They are examples of "levei-2" 
operations. 

The matrix update C = AB + C is a "level-3" operation. Level-3 
operations involve a quadratic amount of data and a cubic amount of work. 
If A, B, and C are n-by-n matrices, then C = AB + C involves O(n?) 
matrix entries and O(n?) arithmetic operations. 

The design of matrix algorithms that are rich in high-level linear al- 
gebra operations is a recurring theme in the book. For example, 9 high 
performance linear equation solver may require a level-3 organization of 
Gaussian elimination. This requires some algorithmic rethinking because 
that method is usually specified in level-1 terms, e.g., “multiply row 1 by a 
scalar and add the result to row 2.” 


1.1.16 A Note on Matrix Equations 


In striving to understand matrix multiplication via outer products, we es- 
sentially established the matrix equation 


P 
AB - S ab] 
k=l 


where the a, and b; are defined by the partitionings in (1.1.3). 
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Numerous matrix equations are developed in subsequent chapters. Some- 
times they are established algorithmically like the above outer product ex- 
pansion and other times they are proved at the ij-component level. As 
an example of the latter, we prove an important result that characterizes 
transposes of products. 


Theorem 1.1.1 If A € RP and B c RP*^, then (AB)T = BT AT. 
Proof. If C = (AB), then 


p 
cij = [(AB)" is = [AB] ya = 5 arbi - 


kml 


On the other hand, if D = BT A’, then 


P D 
dij = [BT AT ys = 5 [B (AT hes = 5 ^ bias. 
k=l k=l 


Since cj; = dj; for all i and 7, it follows that C = D. D 


Scalar-level proofs such as this one are usually not very insightful. However, 
they are sometimes the only way to proceed. 


1.1.17 Complex Matrices 


From time to time computations that involve complex matrices are dis- 
cussed. The vector space of m-by-n complex matrices is designated by 
(7^**. The scaling, addition, and multiplication of complex matrices corre- 
sponds exactly to the real case. However, transposition becomes conjugate 
transposition: 
C = Af = Ci a5; 7 

The vector space of complex n-vectors is designated by C^. The dot product 
of complex n-vectors r and y is prescribed by 


T 
H - 
8x y m Fy. 
iml 


Finally, if A= B + iC € C"*", then we designate the real and imaginary 
parts of A by Re(A) = B and Im(A) = C respectively. 


Problems 


P1.1.1 Suppose A c R°** and z € R” are given. Give a saxpy algorithm for computing 
the first column of M = (A— ziI)---(A— rrt). 
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P1.1.2 In the conventional 2-by-2 matrix multiplication C = AB, there are eight 
multiplications: 03163), 811612, 21511, 021512, 212521, O12072, 423031 and azaba. Make 
a table that indicates the order that these multiplications are performed for the ijk, jik, 
kijf, tkj, jki, and kjí matrix multiply algorithms. 

P1.1.3 Give an algorithm for computing C = (zyT)* where z and y are n-vectora. 
P1.1.4 Specify an algorithm for computing (XY 7T )* where X, Y c I*2, 


P1.1.5 Formulate an outer product algorithm for the update C = ABT + C where 
Ac R™*", B eR?" and C e R™*~*, 


P1.1.8 Suppose we have real n-by-n matrices C, D, E, and F. Show how to compute 
real n-hy-n matrices A and B with just three real n-by-n matrix multiplications so that 
(A T EB) = (C+iD}(E + iF). Hint: Compute W = (C+ D((E — F}. 


Notes and Heferences for Sec. 1.1 


It must be stressed that the development of quality software from any of our “semi- 
formal” algorithraic presentations is a long and arduons task. Even the implementation 
of the fevel-1,2, and 3 BLAS require care: 


C.L. Lawson, R.J. Hanson, , D.R. Kincaid, and F.T. Krogh (1979). “Basic Linear 
Algebra Subprograms for FORTRAN Usage,” ACM Trans. Math. Soft. 5, 308—323. 

C.L. Lawson, R.J. Henson, D.R. Kincaid, and F.T. Krogh (1979). “Algorithm 539, 
Basic Linear Algebra Subprograms for FORTRAN Usage,” ACM Trans. Math. Soft. 
5, 324-325. 

J.J. Dongarrs, J. Du Croz, S. Hammazling, and R.J. Hanson (1988). “An Extended Set 
of Fortran Basic Linear Algebra Subprograms,” ACM Trans. Math. Soft. 14, 1-17. 

J.J. Dongarra, J. Du Cros, S. Hammarling, and R.J. Hanson (1988). "Algorithm 656 An 
Extended Set of Fortran Basic Linear Algebra Subprograms: Model Implementation 
and Test Programa," ACM Trans. Math. Soft. 14, 15-32. 

J.J. Dongarra, J. Du Cros, LS. Duff, and S.J. Hammarling (1990). “A Set of Level 3 
Basic Linear Algebra Subprograms,” ACM Trans. Math. Soft. 16, 1-17. 

J.J- Dongarra, J. Du Cros, L.S. Duff, and S.J. Hammarling (1990). “Algorithm 679. A 
Set of Level 3 Basic Linear Algebra Subprograms: Model Implementation and Test 
Programs,” ACM Trans. Math. Soft. 16, 18-28. 


Other BLAS references include 


B. Kdgstrom, P. Ling, and C. Van Loan (1991). “High-Performance Level-3 BLAS: 
Sample Routines for Double Precision Real Data,” in High Performance Computing 
II, M. Durand and F. El Dabaghi (eds), North-Holland, 269-281. 

B. KAgstróm, P. Ling, and C. Van Loan (1995). "GEMM-Based Level-3 BLAS: He» 
Performance Model Implementations and Performance Evaluation Benchmark,” 
Paraliel Programming and Applications, P. Fritzon and L. Finmo (eds), ISO Press, 
184—188. 


For an appreciation of the subtleties associated with software development we recommend 


J.R. Rice (1981). Matriz Computations and Mathematical Software, Academic Prem, 
New York. 


and a browse through the LAPACK manual. 
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1.2 Exploiting Structure 


The efficiency of a given matrix algorithm depends on many things. Most 
obvious and what we treat in this section is the amount of required arith- 
metic and storage. We continue to use matrix-vector and matrix-matrix 
multiplication as a vehicle for introducing the key ideas. As examples of 
exploitable structure we have chosen the properties of bandedness and sym- 
metry. Band matrices have many zero entries and so it is no surprise that 
band matrix manipulation allows for many arithmetic and storage short- 
cuts, Arithmetic complexity and data structures are discussed in this con- 
text. 

Symmetric matrices provide another set of examples that can be used to 
illustrate structure exploitation. Symmetric linear systems and eigenvalue 
problems have a very prominent role to play in matrix computations and 
so it is important to be familiar with their manipulation. 


1.2.1 Band Matrices and the x-0 Notation 


We say that A € IR"*" has lower bandwidth p if a,; = 0 whenever i > j+p 
and upper bandwidth q if j > i+ gq implies aj; = 0. Here is an example of 
an &by-5 matrix that has lower bandwidth 1 and upper bandwidth 2: 


cocococoomxx 
ccODOIOOSNXNXIX 
aooooa xk KX K XK 
oe eK KX X KX JG 
oo x xX XxX xX OG 


The x’s designates arbitrary nonzero entries. This notation is handy to 
indicate the zero-nonzero structure of a matrix and we use it extensively. 
Band structures that occur frequently are tabulated in Table 1.2.1. 


1.2.2 Diagonal Matrix Manipulation 


Matrices with upper and lower bandwidth zero are diagonal If D c R™** 
is diagonal, then 


D = diag(dy,...,d,), q = min(m,n] = d; = dy 


If D is diagonal and A is a matrix, then DA is a row scaling of A and AD 
is a column scaling of A. 
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Type Lower Upper 
of Matrix Bandwidth | Bandwidth 
i 0 


tridiagonal 

upper bidiagonal 
lower bidiagonal 
upper Hessenberg 
lower Hessenberg 


TABLE 1.2.1. Band Terminology for m-by-n Matrices 


1.2.3 Triangular Matrix Multiplication 


To introduce band matrix “thinking” we look at the matrix multiplication 
problem C = AB when A and B are both n-by-n and upper triangular. 
The 3-by-3 case is illuminating: 


811511. 011512 + Giza = a41543 + @i2be3 + a33533 
C = Q 122522 122553 + @23b33 


0 0 433 b33 


It suggests that the product is upper triangular and that its upper trian- 
gular entries are the result of abbreviated inner products. Indeed, since 
Gikby; = 0 whenever k < ior j < k we see that 


j 
Cij — S| auby; 
ki 


and so we obtain: 


Algorithm 1.2.1 (Triangular Matrix Multiplication) If A, B € IR?*" 
are upper triangular, then this algorithm computes C — AB. 


C=0 
for i= 1:n 
for j = tn 
for k = ij 
C(i,j) = A(i, k)B(k, j} + C(t, j) 
end 
end 


end 
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To quantify the savings in this algorithm we need some tools for measuring 
the amount of work. 


1.2.4 Flops 


Obviously, upper triangular matrix multiplication involves less arithmetic 
than when the matrices are full. One way to quantify this is with the notion 
of a flop. A flop! is a floating point operation. A dot product or saxpy 
operation of length n involves 2n flops because there are n multiplications 
and n adds in either of these vector operations. 

The gaxpy y = Ar + y where A € R™*” involves 2mn flops as does an 
m-by-n outer product update of the form A = A + ry". 

The matrix multiply update C = AB + C where A c R™*?, B e R°*", 
and C € R™*" involves 2mnp flops. 

Flop counts are usually obtained by summing the amount of arithmetic 
associated with the most deeply nested statements in an algorithm. For 
matrix-matrix multiplication, this is the statement, 


Cli, j) = Ali, k)B(K, j) + CG 3) 


which involves two flops and is executed mnp times as a simple loop ac- 
counting indicates. Hence the conclusion that general matrix multiplication 
requires 2mnp flops. 

Now let us investigate the amount of work involved in Algorithm 1.2.1. 
Note that c;;, (i € j) requires 2(j — i + 1) flops. Using the heuristics 


» .g«-)0 c 


and 4 
2 
2 Č Ë ada È 
2? ag br ge 


we find that triangular matrix multiplication requires one-sixth the number 
of flops as full matrix multiplication: 


n n—t+l 


TE- 2 32 3 yap E Log um. 


iml jæi iml j=l iml iml 


We throw away the low order terms since their inclusion does not contribute 
to what the flop count "says." For example, an exact flop count of Algo- 
rithm 1.2.1 reveals that precisely n?/3 + n? + 2n/3 flops are involved. For 


lin the first edition of this book we defined a flop to be the amount of work associated 
with an operation of the form dij = a4; + a4.0&;, i.&., a floating point add, a floating 
point multiply, and some subacripting. Thus, an "old flop" involves two "new flops.” In 
defining a flop to be a single floating point operation we are opting for a more precise 
measure of arithmetic complexity. 
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large (the typical situation of interest) we see that the exact flop count 
offers no insight beyond the n?/3 approximation. 

Flop counting is à necessarily crude approach to the measuring of pro- 
gram efficiency since it ignores subscripting, memory traffic, and the count- 
less other overheads associated with program execution. We must not infer 
too much from a comparison of flops counts. We cannot conclude, for ex- 
ample, that triangular matrix multiplication is six times faster than square 
matrix multiplication. Flop counting is just a “quick and dirty" accounting 
method that captures only one of the several dimensions of the efficiency 
issue. 


1.2.5 The Colon Notation—Again 


The dot product that the k-loop performs in Álgorithm 1.2.1 can be suc- 
cinctly stated if we extend the colon notation introduced in $1.1.8. Suppose 
A € R™*" and the integers p, q, andr satisfy l < p Eq X nandi € r € m. 
We then define 


A(r, rq) = [ arp, - - - , arg | e REX PFD) 
Likewise, if 1 <p<q< m and i € c € n, then 
Ope 
A(yqc)-| : | eRe Pt, 
oc 
With this notation we can rewrite Algorithm 1.2.1 as 
C(1:n, lin) = 0 
for + = lin 
for 7 =tn 
C(t, 7) = A(51:7)B(i:5, 7) + C(t, 3) 
end 
end 


We mention one additional feature of the colon notation. Negative in- 
crements are allowed. Thus, if z and y are n-vectors, then s = zT y(n:—1:1) 


is the summation : 
s = Ý Tiniti: 
ixl 
1.2.6 Band Storage 


Suppose A € E**" has lower bandwidth p and upper bandwidth g and 
assume that p and q are much smaller than n. Such a matrix can be stored 
in a (p +g + 1)-by-n array A.band with the convention that 


ag = Aband(i -j +q+1, j) (1.21) 
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for ail (i, j) that fall inside the band. Thus, if 


then 


Ü a a a a a 
A.band = 12 23 34 45 56 
G1] G22 GJ 44 G55 ae 
G21 033 G43 O54 Ge, Ù 


Here, the "0" entries are unused. With this data structure, our column- 
Oriented gaxpy algorithm transforms to the following: 


Algorithm 1.2.2 (Band Gaxpy) Suppose A € R”™" has lower band- 
width p and upper bandwidth q and is stored in the A.band format (1.2.1). 
If z, y € R”, then this algorithm overwrites y with Az + y. 


for j = l:n 
Utop = max(1, j = q) 
Ysa = min(n, j + p) 
Stop = max(1,q +2- j) 
Übot = Atop T Vbot — Ytop 
F u(Yeop:Yooe) = z(J) À.band(aip:a5ot ) + Y(Ytop:Ybor) 
en 


Notice that by storing A by column in A.band, we obtain a saxpy, column 
access procedure. Indeed, Algorithm 1.2.2 is obtained from Algorithm 1.1.4 
by recognizing that each saxpy involves a vector with a amall nuraber of 
nonzeros. Integer arithmetic is used to identify the location of these nonze- 
ros. As a result of this careful zero/nonzero analysis, the algorithm involves 
just 2n(p +q + 1) flops with the assumption that p and 4 are much smaller 
than n. 


1.2.7 Symmetry 
We say that A € IR"“"is symmetric if A? = A. Thus, 


123 
A= |2 4 5 
3.5 6 
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is symmetric. Storage requirements can be halved if we just store the lower 
triangle of elements, e.g, Avec=[1 2 3 4 5 6 ]. In general, with 
this data structure we agree to store the a;; as follows: 


aig = Ave(( -)n-jG-1/2*0) (i23) (1.2.2) 


Let us look at the column-oriented gaxpy operation with the matrix A 
represented in A.vec. 


Algorithm 1.2.3 (Symmetric Storage Gaxpy) Suppose A c IR?*" is 
symmetric and stored in the A.vec style (1.2.2). If z,y € IR”, then this 
algorithm overwrites y with Ar + y. 


for j= l:n 
for i = 1:7-1 
y(i) = A.vec((i — 1)n — i(i — 1)/2 + 3)z(j) + v(1) 
end 
for i= jin 
y(t) = A.vec((j — 1)n — 36 ~ 1)/2 + i)z(3) + yli) 
end 
end 


This algorithm requires the same 2n? flops that an ordinary gaxpy requires. 
Notice that the halving of the storage requirement is purchased with some 
awkward subscripting. 


1.2.8 Store by Diagonal 
Symmetric matrices can also be stored by diagonal. If 


123 
A-2|245,|, 
3 9 6 


then in a store-by-diagonal scheme we represent A with the vector 
Adiag=[1 4 6 2 5 3]. 
In general, if i > 7, then 
Gik = A.dtag(i + nk — k(k — 1)/2) (k > 0) (1.2.3) 


Some notation simplifies the discussion of how to use this data structure in 
a matrix-vector multiplication. 

If A € IR?*", then let D(A, k) € R'"*" designate the kth diagonal of A 
as follows: 


Gi; j—1c-k, l<icm, 1€7;€n 
(D(A, &)];; =f 0 herein : 
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Thus, 
12 3 0 0 3 0 2 0 
A = 2457;=-/000;/+/90 0 5 
3 5 6 0 0 0 0 6G D 
—— M '— eee má 
D(A.2) D(A,1) 
10 0 0 0 0 000 
--104 014-12. 0 0}7}+790 0 0 
0 0 6 0 5 0 4 0 0 
D(A,0) D(A,-1) D(A,—-2) 


Returning to our store-by-diagonal data structure, we see that the nonzero 
parts of D(A,0), D(A,1),..., D(A,n — 1) are sequentially stored in the 
A.diag scheme (1.2.3). The gaxpy y = Az + y can then be organized as 
follows: 


n-1 
y = D(A0)z + S (D(A k) + D(A,K)T)z + y. 


kml 


Working out the details we obtain the following algorithm. 


Algorithm 1.2.4 (Store-By-Diagonal Gaxpy) Suppose A € EC" is 
symmetric and stored in the A.diag style (1.2.3). If z,y € R”, then this 
algorithm overwrites y with Az + y. 


for i= lin 
y(i) = A.diag(i)z(1) + y(i) 
end 
for k = lin- 1 
t = nk -— k(k — 1)/2 
{y = D(A, k)z + y) 
for i = lın — k 
y(t) = A.diag(i + t)z(1 + &) + y(i) 
end 
{y = D(A, k)T x + y) 
fort = i:n — k 
yli + k) = A.diag(t + tjx(i) + y(i + k) 
end 
end 


Note that the inner loops oversee vector multiplications: 


y(1:n — k) = A.diag(t + 1t +n — k). x z(k + Ln) + y(L:n — k) 
y(k + lin) = A.diag(t + 1:2 +n — Kk). + z(1:n — k) + y(k + 1:n) 
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1.2.9 A Note on Overwriting and Workspaces 


An undercurrent in the above discussion has been the economical use of 
storage. Overwriting input data is another way to control the amount of 
memory that a matrix computation requires. Consider the n-by-n matrix 
multiplication problem C = AB with the proviso that the "input matrix" 
B is to be overwritten by the “output matrix" C . We cannot simply 
transform 


C(1:n,1:n) = 0 
for j = Ln 
for k = i:n 
CE, j) = C6.3) + AC ABR) 
end ~ 
end 
to 
for j = lm 
for k = l:n 
Be, 3) = BC, 3) + AG, k}B(k, j} 
end 
end 


because B(:, j) is needed throughout the entire k-loop. A linear workspace 
is needed to hold the 7th column of the product until it is “safe” to overwrite 


B(:, 9): 

for ; = lm 
w(1:n) = 0 
for k = lin 

u(:) = w(:) + AG, k)B(k, j) 

end 
BC:, 7) = w(:) 

end 


A linear workspace overhead is usually not important in a matrix compu- 
tation that has a 2-dimensional array of the same order. 


Problems 


P1.2.1 Give an algorithm that overwrites A with A? where A c R**" is (a) upper 
triangular and (b) square. Strive for a minimum workspace in each case. 

P1.2.2 Suppose A € R**" is upper Hessenberg and that scalars A1,..., À- are given. 
Give a saxpy algorithm for computing the first column of M = (A—A11)---(A— AI). 


P1.2.3 Give a column saxpy algorithm for the n-by-n matrix multiplication problem 


24 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS 


C = AB where A is upper triangular and B is lower triangular. 

P1.2.4 Extend Algorithm 1.2.2 90 that it can handle rectangular band matrices. Be 
sure to describe the underlying data structure. 

P1.2.5 A c R**™ is Hermitian if A! = A. If A = B + iC, then it is easy to show that 
BT = B and CT = —C. Suppose we represent A in an array A.herm with the property 
that A.herm(i, j) houses b;; if i > j aad cj; if j > i. Using this data structure write a 
matrix-vector multiply function that computes Re(z) and Im(z) from Re(z) and Im(xz) 
so that z — Az. 

P1.2.8 Suppoae X € R^ *? and A € R"*”, with A symmetric and stored by diagonal. 
Give an algorithm that computes Y = X7 AX and stores the result by diagonal. Use 
separate arrays for A and Y. 

P1.2.7 Suppose a € R” is given and that A € R'*" has the property that aij = 
a|;-;|41. Give an algorithm that overwrites y with Ar + y where z, y € R^ are given. 
P1.2.8 Suppose a € R" is given And that A € R?*" has the property that ag = 
Q((i4j—1) mod »)41. Give an algorithm that overwrites y with Ar + y where z,y c R^ 
are given, 

P1.2.8 Develop a compact store-by-diagonal scheme for unsymmetric band matrices 
and write the corresponding gaxpy algorithm. 

P1.2.10 Suppose p and q are n-vectors and that A = (dij } is defined by aj; = aj; = pug 
for 1 <4 37 <n, How many flops are required to compute y = Az where r € R” is 
given? 


Notes and References for Sec. 1.2 


Consult the LAPACK manual for a discussion about appropriate data structures when 
symmetry and/or bandedness is present. See also 


N. Madsen, G. Roderigue, and J. Karush (1976). "Matrix Multiplication by Diagonals 
on a Vector Parallel Processor," Infomation Processing Letters 5, 41-45. 


1.3 Block Matrices and Algorithms 


Having a facility with block matrix notation is crucial in matrix compute- 
tions because it simplifies the derivation of many central algorithms. More- 
over, "block algorithms" are increasingly important in high performance 
computing. By a biock algorithm we essentially mean an algorithm that 
is rich in matrix-matrix multiplication. Algorithms of this type turn ont 
to be more efficient in many computing environments than those that are 
organized at a lower linenr algebraic level. 


1.3.1 Block Matrix Notation 


Column and row partitionings are special cases of matrix blocking. In 
general we can partition both the rows and columns of an m-by-n matrix 
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A to obtain 
Ai à Air 1/1 
A=]: 0: 
Ag: 77 Agr Nig 
hy fie 


where m; t: +m, =m, ni c-r cc: n, =n, and Aag designates the 
(a, 8) block or submatrix. With this notation, block A,g has dimension 
Mo-by-ng and we say that A = (Aag) is a q-by-r block matrix. 


1.3.2 Block Matrix Manipulation 


Block matrices combine just like matrices with scalar entries as long as 
' certain dimension requirements are met. For example, if 


ny Ny 


then we say that B is partitioned conformadly with the matrix A above. 
The sum C = A+ B can also be regarded as a q-by-r block matrix: 


Cu c Cir AuctBu c5 Airt Bir 


Cy, dx Oa AB «th ABS 


The multiplication of block matrices is a little trickier. We start with a pair 
of lemmas. 


Lemma 1.3.1 If A € IR?*?, B c BPX, 


Ay TIT 
A= R B = [ Bi 1 1 B. l , 
Ag Ma nı Thy 
then 

Ci... Cir my 

ABs C= [| : | 
Cu eer Car Mg 
Tti Thy 


where Cag = Ag Bg for a = 1:¢ and B = Lr. 
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Proof. First we relate scalar entries in block Cag to scalar entries in C. 
For i<a<q,1<f8<r,1<i<m,, and 1< j< ng we have 


[Casi = CA iai 


where 


But 


p P 
Cititi = 3 axe e ? [Aala [Bs]. E |AaBs];;- 


ko} k=l 


Thus, Cag = Aa Bg. ffl 


* 


Lemma 1.3.2 If Ac IR™*?, B c RP", 


A = l Ài y^] As | , and B = . 1 
m Ps B, Ds 


AB = C = Y AB,. 


y=] 


Proof. We set s = 2 and leave the general s case to the reader. (See 
P1.3.6.) For 1 € i < m and 1 € j € n we have 


p pi ptr 
Cj = 3 obi = 3 aub; + p» Gub 
kæ] ko} kæpi+l 


[A Ei] T [A2 B3], = [AB + A3 Bs] . 
Thus, C = Ai; + A324. O 


For general block matrix multiplication we have the following result: 
Theorem 1.3.3 If 


As foi: (B2 |i: 
Agi i Ags Mg By UU Bar Ps 
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and we partition the product C — AB as follows, 


Ci... Cir "mi 

C = : : I 
Col E Cer Te 
01 Ty 


Proof. See P1.3.7. O 


A very important special case arises if we set 3 = 2, r = 1, and n; = 1: 


An Aye | | Ti | _ | Arti Aja. 
Ag, Ag T3 AgiZ, + Agere |` 


This partitioned matrix-vector product is used over and over again in sub- 
sequent chapters. 


1.3.3 Submatrix Designation 


As with "ordinary" matrix multiplication, block matrix multiplication can 
be organized in several ways. To specify the computations precisely, we 
need some notation. 

Suppose A € R™*" and that i = (i,,...,1,) and j = (j1,..., Jc) are 
integer vectors with the property that 


ee e (1,2,..., m) 
>| ee ee = {1,2,... 7}. 
We let A(i, 7) denote the r-by-c submatrix 


Á(üu,n) +++ Alin, Je) 
À(i,3) = : 
Alir, 31) xum Alir, 3c) 
If the entries in the subscript vectors i and j are contiguous, then the 
“colon” notation can be used to define A(t, 7) in terms of the scalar entries 
in A. In particular, if 1 < i; < tg X m and 1 € ji < jo € n, then 
Á(1:13, j1:j2) is the submatrix obtained by extracting rows 1, through i4 
and columns 7; through jg, e.g, 
azi 32 

A(3:5,1:2) = | a4, Gaz 

O51 452 
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While on the subject of submatrices, recall from 81.1.8 that if i and j are 
scalars, then A(i,:) designates the ith row of A and A(:, j) designates the 
jth column of A. 


1.3.4 Block Matrix Times Vector 


An important situation covered by Theorem 1.3.3 is the case of a block 
matrix times vector. Let us consider the details of the gaxpy y = Az + y 
where A € IR™*", ce I^, y € IR", and 


Ay my Ui my 


: Ag Ma Ya Ma 
We refer to A; as the ith block row. If m.vee = (m1, . ~., Mmg) is the vector 
of block row “heights”, then from 


yı Ay un 
peti fet]: 
L y Ag Yq 
we obtain 
last = 0 
for i = lig 
first = last +1 
last = first + m.vec(i) — 1 (1.3.1} 
y(first:last) = A( first:last,:)z + y( first:last) 
end 


Each time through the loop an “ordinary” gaxpy is performed so Algorithms 
1.1.3 and 1.1.4 apply. 

Another way to block the gaxpy computation is to partition A and T as 
follows: 


Ae A eee] T = 
ni Tir 


In this case we refer to A; as the jth block column of A. If n.vec = 
(n1,..., ,) is the vector of block column widths, then from 


T] r 
y = [As Are] ] : ty = b Ast; ty 
I. jml 


we obtain 
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last = 0 
for 7 = lir 
first = last +1 
last = first + n.vec{j) - 1 (1.3.2) 


y= A(:, firstlast)z(first:last) + y 
end 


Again, the gaxpy’s performed each time through the loop can be carried 
out with Algorithm 1.1.3 or 1.1.4. 


1.3.5 Block Matrix Multiplication 


Just as ordinary, scalar-level matrix multiplication can be arranged in sev- 
.eral possible ways, so can the multiplication of block matrices. Different 
blockings for A, B, and C can set the stage for block versions of the dot 
product, saxpy, and outer product algorithms of 81.1. To illustrate this 
with a minimum of subscript clutter, we assume that these three matrices 
are all n-by-n and that n = N€ where N and £ are positive integers. 

If A = (Aag), B = (Bag), and C = (Cag) are N-by-N block matrices 
with £-by-/ blocks, then from Theorem 1.3.3 


N 
Cas = $ As Bao + Cog a=1:N, BIN. 


=l 


If we organize a matrix multiplication procedure around this summation, 
then we obtain a block analog of Algorithm 1.1.5: 


for a = LN 
i = (a—1Y-1:af 
for d — 1:N 
j2(B—1)0-c-rL8f (1.3.3) 
for y = 1:N 
k-(y—1£--1 
C(i,3) = AG, K)B(,j) + C49) 
end 
end 
end 


Note that if £ = 1, then a =i, 8 = j, and y = k and we revert to Algorithm 
1.1.5. 
To obtain a block saxpy matrix multiply, we write C = AB + C as 
Bu c: Bin 
[01€ ] S [ A. An | : EN : *| Cras Oy. | 
Bu. -> ByN 


where A4, C, € R'**, and Bag € IRÉ**, From this we obtain 
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for G=1:N 
j2(8—1044 1:88 
for a = LN 
i= (a— 1+ iat (1.3.4) 
C(3) = AC, i) BA, 3) + CC, 7) 
end 
end 


This is the block version of Algorithm 1.1.7. 
A block outer product scheme results if we work with the blockings 


Br 
A = [A,n An] B= : 
By 
where A, B, € R**, From Lemma 1.3.2 we have 
N 
C —- AB +C 
yæl 
and so 
for 7 = N 
k= (y—-1)£4 1m 
C = At, k)B(k,:) +C (1.3.5) 
end 


This is the block version of Álgorithm 1.1.8. 


1.3.6 Complex Matrix Multiplication 
Consider the complex matrix multiplication update 
C, +iCy = (A + £A2)(B1 + iB2) + (C1 + 1C) 


where all the matrices are real and i? = —1. Comparing the real and 
imaginary parts we find 


Cy 
Cy 


A,B, — AB: + C 
A, B4 + AB, + Ca 


and this can be expressed as follows: 


oa aae] 
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This suggests how real matrix software might be applied to solve complex 
matrix problems. The only snag is that the explicit formation of 


; [4 -4 
i| i 


requires the “double storage" of the matrices A, and Az. 


1.3.7 A Divide and Conquer Matrix Multiplication 


We conclude this section with a completely different approach to the matrix- 
matrix multiplication problem. The starting point in the discussion is the 
2-by-2 block matrix multiplication 


EX Cia | | Au re A | By, Big | 
C3 Agi Ba Bn 


where each block is square. In the ordinary algorithm, C;; = Ai Bi; + 
Aj2Bq;. There are 8 multiplies and 4 adds. Strassen (1969) has shown how 
to compute C with just 7 multiplies and 18 adds: 


P, = (Aun Ag3) (Bii + Boo) 
P, = (Án +An)Bu 

Py = Ai (Bi2 - Bn) 

Py = Ago Ba — Bir) 

Poh = (An + Arg) Boe 

P = (An — AnXBn + Big) 
Po = (Aig — Azz)}{ Boi + Bog) 
Cn = PB-BR-PLIE 

Cy = Pc-P, 

Ca = R+A 

Cn = P+R- P +P 


These equations are easily confirmed by substitution. Suppose n = 2m so 
that the blocks are m-by-m. Counting adds and multiplies in the compu- 
tation C = AB we find that conventional matrix multiplication involves 
(2m)? multiplies and (2m)? — (2m)? adds. In contrast, if Strassen's al- 
gorithm is applied with conventional multiplication at the block level, then 
Tm? multiplies and 7m? + 11m? adds are required. if m >> 1, then the 
Strassen method involves about 7/8ths the arithmetic of the fully conven- 
tional algorithm. 

Now recognize that we can recur on the Strassen idea. In particular, we 
can apply the Strassen algorithm to each of the half-sized block multiplica- 
tions associated with the F;. Thus, if the original A and B are n-by-n and 
n = 27, then we can repeatedly apply the Strassen multiplication algorithm. 
At the bottom “level,” the blocks are 1-by-1. Of course, there is no need to 
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recur down to the n = 1 level. When the block size gets sufficiently small, 
(n € nmin), it may be sensible to use conventional matrix multiplication 
when finding the F; . Here is the overall procedure: 


Algorithm 1.3.1 (Strassen Multiplication) Suppose n = 2° and that 
A€R'*" and B c R'*". If nmin = 27 with d < q, then this algorithm 
computes C = AB by applying Strassen procedure recursively q — d times. 


function: C = strass( A, B, n, nmin) 

if n € nmin 
C= AB 

else 
m — n/2;u = kimt =m + lm; 
P, = strass(A(u, u) + A(v, v), B(u, u) + B(v, v), m, nmin) 
P; = strass( A(v, u) + A(v, v), B(u, u), m, nmin) 
P, = strass(A(u, u), B(u,v) — B(v,v), m, nma) 
P4 = strass(A(v,v), B(v, u) — B(u, u), m, nmin) 
Ps = strass(A(u, u) + Alu, v), B(v, v), m, nmin) 
P; = strass(A(v,u) — A(u,u), B(u, u) + B(u,v), m, nmin) 
P; = strass(A(u, v) — A(v,v), B(v,u) + B(v,v), m, nmin) 
C(u, u) = Pi + Pa- P; + Pr 
C(u,v) = P5 + Ps 
Cv, u) = Ph + Fy 
C(v, v) -P--P- Py + Ps 


end 


Unlike any of our previous algorithms strass is recursive, meaning that 
it calls itself. Divide and conquer algorithms are often best described in 
this manner. We have presented this algorithm in the style of a MATLAB 
function so that the recursive calls can be stated with precision. 

The amount of arithmetic associated with strass is a complicated func- 
tion of n and nmin. If nein “> 1, then it suffices to count multiplications 
as the number of additions is roughly the same. If we just count the mul- 
tiplications, then it suffices to examine the deepest level of the recursion 
as that is where all the multiplications occur. In strass there are q — d 
subdivisions and thus, 7177 conventional matrix-matrix multiplications to 
perform. These multiplications have size nmin and thus strass involves 
about s = (27)? 7374 multiplications compared toc = (27)*, the number 
of multiplications in the conventional approach. Notice that 


3 94x. rà nin 
:-(E) = (5) | 
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If d=0, i.e., we recur on down to the 1-by-1 level, then 
7 q 
som (s) c = T! = neta? un, 


Thus, asymptotically, the number of multiplications in the Strassen proce- 
dure is O(n2507), However, the number of additions (relative to the number 
of multiplications) becomes significant as nmin gets small. 


Example 1.3.1 I[n = 1024 and nmin = 64, then strass involves (7/8)1975 = .6 the 
arithmetic of the conventional algorithm. 


Problems 


P1.3.1 Generalize (1.3.3) so that it can handie the variable block-size problem covered 
by Theorem 1.3.3. z 


P1.3.2 Generalize (1.3.4) and (1.3.5) so that tbey can handle the variable block-size 
case. 

P1.3.3 Adapt strass so that it can handle square matrix multiplication of any order. 
Hint: Ifthe “current” A has odd dimension, append a zero row and column. 

P1.3.4 Prove that if 


An ce Air 
A= ; te : 
Ano Age 
is a blocking of the matrix A, then 
Ah pub AD 
AT = g 
Af, DA Al, 


P1.3.5 Suppose n is even and define the following function from R” to R: 
n/12 
J(z) = z(Uzn)l'z(zn) = V ruiz 
sal 
(a) Show that if z, y € R^ then 
nji 
zTy = S "(rica + yaen +ys-1) - fz) — Ay) 
i=l 
(b) Now consider the n-by-n matrix multiplication C = AB. Give an algorithm for 
computing this product that requires n*/2 multiplies once f is applied to the rows of A 
and the columns of B. See Winograd (1968) for details. 
P1.3.8 Prove Lemma 1.3.2 for general s. Hint. Set 


Py=m +- tp paket 


and show that 
s Pyu-1 


"ESD Y Gib. 


Tul kapy +1 
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P1.3.7 Use Lemmas 1.3.1 and 1.3.2 to prove Theorem 1.3.3. In particular, set 
Ais 
A,=] : and = B,-[ Bn ©- Bor | 
Aay 
and note from Lemma 1.3.2 that 


7-1 
Now analyze each A, B4 with the help of Lemma 1.3.1. 


Notes and References for Sec. 1.3 


For quite some time fast methods for matrix multiplication have attracted a lot of at- 
tention within computer science, Spe 


S. Winograd (1963). “A New Algorithm for Inner Product," [EEE Trans. Comp. C-17, 
693-694. 

V. Strassen (1969). "Gaussian Elimination is Not Optimal,” Numer. Math. 13, 354-356. 

V. Pan (1984). "How Can We Speed Up Matrix Multiplication?," SIAM Review 26, 
393—416. 


Many of these methods have dubious practical value. However, with the publication of 


D. Bailey (1983). “Extra High Speed Matrix Multiplication on the Cray-2," SIAM J. 
Sex and Stat. Comp. 9, 603-607. 


it ts clear that the blanket dismissal of these fast procedures is unwise. The "stability" 
of the Strassen algorithm is discussed in $2.4.10. See also 


N.J. Higham (1990). “Exploiting Fast Matrix Muitiplication within the Level 3 BLAS,” 
ACM Trans, Math. Soft. 16, 352-368. 

C.C. Douglas, M. Heroux, G. Slishman, and R.M. Smith (1994). “GEMMW: A Portable 
Level 3 B Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm,” 
J. Comput. Phys. 110, 1-10. 


1.4  Vectorization and Re-Use Issues 


The matrix manipulations discussed in this book are mostly built upon 
dot products and saxpy operations. Vector pipeline computers are able 
to perform vector operations such as these very fast because of special 
hardware that is able to exploit the fact that a vector operation is a very 
regular sequence of scalar operations. Whether or not high performance 
is extracted from such a computer depends upon the length of the vector 
operands and a number of other factors that pertain to the movement of 
data such as vector stride, the number of vector loads and stores, and 
the level of data re-use. Our goal is to build a useful awareness of these 
issues. We are not trying to build a comprehensive model of vector pipeline 
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computing that might be used to predict performance. We simply want to 
identify the kind of thinking that goes into the design of an effective vector 
pipeline code. We do not mention any particular machine. The literature 
is filled with case studies. 


1.4.1 Pipelining Arithmetic Operations 


The primary reason why vector computers are fast has to do with pipelin- 
ing. The concept of pipelining is best understood by making an analogy to 
assembly line production. Suppose the assembly of an individual automo- 
bile requires one minute at each of sixty workstations along an assembly 
line. If the line is well staffed and able to initiate the assembly of a new car 
every minute, then 1000 cars can be produced from scratch in about 1000 
+ 60 = 1060 minutes. For a work order of this size the line has an effective 
"vector speed" of 1000/1060 automobiles per minute. On the other hand, 
if the assembly line is understaffed and a new assembly can be initiated 
just once an hour, then 1000 hours are required to produce 1000 cars. In 
this case the line has an effective "scalar speed" of 1/60th automobile per 
minute. 

So it is with a pipelined vector operation such as the vector add z = ry. 
The scalar operations z; = r; + y; are the cars. The number of elements 
is the size of the work order. If the start-to-finish time required for each 
z; is T, then a pipelined, length n vector add could be completed in time 
much less than nr. This gives vector speed. Without the pipelining, the 
vector computation would proceed at a scalar rate and would approximately 
require time nr for completion. 

Let us see how a sequence of floating point operations can be pipelined. 
Floating point operations usually require several cycles to complete. For 
example, a 3-cycle addition of two scalars r and y may proceed as in 
Fic.t.4.1. To visualize the operation, continue with the above metaphor 


FIG. 1.4.1 A 3-Cycle Adder 


and think of the addition unit as an assembly line with three “work sta- 
tions". The input scalars z and y proceed along the assembly line spending 
one cycle at each of three stations. The sum z emerges after three cycles. 


36 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS 


Adjust 
Exponents 


Fic. 1.4.2 Pipelined Addition 


Note that when a single, “free standing” addition is performed, only one of 
the three stations is active during the computation. 

Now consider a vector addition z = x+y. With pipelining, the z and y 
vectors are streamed through the addition unit. Once the pipeline is filled 
and steady state reached, a z; is produced every cycle. In Fic.1.4.2 we 
depict what the pipeline might look like once this steady state is achieved. 
In this case, vector speed is about three times scalar speed because the time 
for an individual add is three cycles. 


1.4.2 Vector Operations 


A vector pipeline computer comes with a repertoire of vector instructions, 
such as vector add, vector multiply, vector scale, dot product, and saxpy. 
We assume for clarity that these operations take place in vector registers. 
Vectors travel between the registers and memory by means of vector load 
and vector store instructions. 

Àn important attribute of a vector processor is the length of its vector 
registers w we designate by v,. A length-n vector operation must be 
broken down, into subvector operations of length v,or less. Here is how such 
a partitioning might be managed in the case of a vector addition z = z +y 
where r and y are n-vectors: 


first — 1 
while first <n 
last = min{n, first + v, — 1} 
Vector load r{ first:last). 
Vector load y(first:last). 
Vector add: z(first:last) = z( first:last) + y( first:last). 
Vector store z( first:last). 
first = last +1 
end 


A reasonable compiler for a vector computer would automatically generate 
these vector instructions from a programmer specified z = z +y command. 
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1.4.3 The Vector Length Issue 


Suppose the pipeline for the vector operation op takes Top cycles to "set 
up." Assume that one component of the result is obtained per cycle once 
the pipeline is filled. The time required to perform an n-dimensional op is 
then given by 
Top() = (Top tre nsw 

where 4 is the cycle time and v, is the length of the vector hardware. 

If the vectors to be combined are longer than the vector hardware length, 
then as we have seen the overall vector operation must be broken down into 
hardware-manageable chunks. Thus, if 


n = nW, 4- ng O< noc", 
then we assume that 


Tp(n) = Ti (Top + v.)H no = 90 
"m (ni(Top + V.) - Top +ou no £O 
specifies the overall time required to perform a length-n op. This simplifies 
to 
Top(n) = (n + roscelln/v)) s 
where ceil(c) is the smallest integer such that a < ceil(a). If p flops per 
component are involved, then the effective rate of computation for general 
n is given by 
p 1 
Rop{n) = a aa 
T, DNO O #14 Fe ceil (2) 

(If u is in seconds, then Ro, is in flops per second.) The asymptotic rate of 
performance is given by 


As a way of assessing how serious the uu overhead is for a vector 
operation, Hockney and Jesshope (1988) define the quantity n,/; to be the 
smallest n for which half of peak performance is achieved, i.e., 

pn2  — 1 p 


Tolm) 2a 
Machines that have big nj;5 factors do not perform well on short vector 
Operations. 

Let us see what the above performance model says about the design 
of the matrix multiply update C = AB + C where A € IR"*?, B e IRP*", 
and C c IR"*^, Recall from 61.1.11 that there are six possible versions of 
the conventional algorithm and they correspond to the six possible loop 
orderings of 
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for i = lim 
for j = lim 
for k = lp 
Cài j) = A(i, k) B{k, j) T C(i, j) 
end 
end 
end 


This is the ijk variant and its innermost loop oversees a length-p dot prod- 
uct. Thus, our performance model predicts that 


Tijk = mnp + mn - ceil(p/v,) Tact 


cycles are required. A similar analysis for each of the other variants leads 
to the following table: 


MAP + mn - Taor(p/ v.) 
MAP + mn Tax(p/vi) 
mnp + MP: Tas (n/v,) 
mnp + np - T,az(m/v,) 
MNP + mp: T;az(n/v,) 
MNP + np: Ta (m/v;) 


We make a few observations based upon some elementary integer arithmetic 
manipulation. Assume that T,4,; and Tg. are roughly equal. If m, n, and 
p are all less than v,, then the most efficient variants will have the longest 
inner loops. If m, n, and p are much bigger than v,, then the distinction 
between the six options is small. 


1.4.4 The Stride Issue 


The “layout” of a vector operand in memory often has a bearing on exécu- 
tion speed. The key factor is stride. The stride of a stored floating point 
vector is the distance (in logical memory locations) between the vector's 
components. Accessing a row in a two-dimensional Fortran array is not a 
unit stride operation because arrays are stored by column. In C, it is just 
the opposite as matrices are stored by row. Nonunit stride vector opera- 
tions may interfere with the pipelining capability of a computer degrading 
performance. 

To clarify the stride issue we consider how the six variants of matrix 
multiplication “pull up” data from the A, E, and C matrices in the inner 
loop. This is where the vector calculation occurs (dot product or saxpy) 
and there are three possibilities: 
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jki or kji: for i = l:m 
C(i, j) = C(t, j) + Ali, kK) B(k, 7) 
end 
ik} or kij: for j = i:n 
C(t, j) = CG, j) + AG, k) B(k, j) 
end 
ijk or jtk: for k = Lp 
C(i,3) = C(i, j) + A(i, K) B(, j) 
end 


Here is a table that specifies the A, B, and C strides associated with each 
of these possibilities: 


jki or kji Unit 0 Unit 
ikj or kij 0 Non-Unit | Non-Unit 
ijk or jik | Non-Unit Unit 0 


Storage in column-major order is assumed. A stride of zero means that only 
a single array element is accessed in the inner loop. From the stride point 
of view, it is clear that we should favor the jki and kji variants. This may 
not coincide with a preference that is based on vector length considerations. 
Dilemmas of this type are typical in high performance computing. One goal 
(maximize vector length) can conflict with another (impose unit stride). 
Sometimes a vector stride /vector length conflict can s be resolved through 
the intelligent choice of data structures. Consider the gaxpy y = Az +y 
where A c R'*" is symmetric. Assume that n < v, for simplicity. If 
A is stored conventionally and Algorithm 1.1.4 is used, then the central 
computation entails n, unit stride saxpy's each having length n: 


for j = lin 
y= A(,3)320) +y 


end 
Our simple execution model tells us that 
Ti = n(Taas +N) 
cycles are required. 


In $1.2.7 we introduced the lower triangular storage scheme for sym- 
metric matrices and obtained this version of the gaxpy: 
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for 7 = l:n 
for i = 1:7 — 1 
yli) = A.vec((i — l)n — i(i — 1)/2 + j)z(3) + y(t) 
end 
for i = jn 
y(i) = A.vec(( — 1)n — 20 — 1)/2 + i) (7) + y(i} 
end 
end 


Notice that the first i-loop does not define a unit stride saxpy. If we assume 
that a length n, nonunit stride saxpy is equivalent to n unit-length saxpys 
(a worst case scenario), then this implementation involves 


T; =n Ga + n) 


2 
cycles. 
In §1.2.8 we developed the store-by-diagonal version: 
for i = i:n 
y(i) = A.diag(i)x(t) + y(i) 
end 
for k = lin - 1 
t = nk —k(k-1}/2 
(y = D(A, k) + y} 
fori=iin-k 
yli) = A.diag(i + t)z(i + k} + y(i) 
end 
(y = D(A, k) z + y} 
for i = lin — k 
yli + k) = A.diag(i + t)z(i) + y(i + k) 
end 
end 


In this case both inner loops define a unit stride vector multiply (vm) and 
our model of execution predicts 


T3 = Nn (2tem + n) 


cycles. 

The example shows how the choice of data structure can effect the stride 
attributes of an algorithm. Store by diagonal seems attractive because it 
represents the matrix compactly and has unit stride. However, a careful 
which-is-best analysis would depend upon the values of 7,44, and Tym and 
the precise penalties for nonunit stride computation and excess storage. 
The complexity of the situation would call for careful benchmarking. 
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1.4.5 Thinking About Data Motion 


Another important attribute of a matrix algorithm concerns the actual vol- 
ume of data that has to be moved around during execution. Matrices sit 
in memory but the computations that involve their entries take place in 
functional units. The control of memory traffic is crucial to performance 
in many computers. To continue with the factory metaphor used at the 
beginning of this section: Can we keep the superfast arithmetic units busy 
with enough deliveries of matriz data and can we ship the results back to 
memory fast enough to avoid backlog? Fic.1.4.3 depicts the typical situa- 
tion in an advanced uniprocessor environment. Details vary from machine 


FiG. 1.4.3 Memory Hierarchy 


to machine, but two "axioms;; prevail: 


e Each level in the hierarchy has a limited capacity and for economic 
reasons this capacity is usually smaller as we ascend the hierarchy. 


e There is a cost, sometimes relatively great, associated with the moving 
of data between two levels in the hierarchy. 


The design of an efficient matrix algorithm requires careful thinking about 
the flow of data in between the various levels of storage. The vector touch 
and data re-use issues are important in this regard. 


1.4.6 The Vector Touch Issue 


In many advanced computers, data is moved around in chunks, e.g., vectors. 
The time required to read or write a vector to memory is comparable to 
the time required to engage the vector in a dot product or saxpy. Thus, the 
number of vector touches associated with a matrix code is a very important 
statistic. By a “vector touch” we mean either a vector load or store. 
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Let’s count the number of vector touches associated with an m-by-n 
outer product. Assume that m = mv, and n = niv, where vis the vector 
hardware length. (See 81.4.3.) In this environment, the outer product 
update A = A + zyT would be arranged as follows: 


for a = Ll: mi 
i —(a-1)w,--lav, 
for 8 = lin; 
j = (8 - 1v, T 1:8, 
A(i, j) = AG, j) + xz (y G)7 
end 
end 


Each column of the submatrix A(i, 7) must be loaded, updated, and then 
stored. Not forgetting to account for the vector touches associated with r 
and y we see that approximately 


y (: FX e2)) zc: mın 


azi B1 


vector touches are required. (Low order terms do not contribute to the 
analysis.) 

Now consider the gaxpy update y = Az + y where y € IR™, z c IR" and 
Ac R™*". Breaking this computation down into segments of length v, 
gives 


for a = li: m, 
i = (a — 1)», + lav, 
for B = 1:ni 
j= (B — Vu, + lv. 
vli) = y(i) + AG, 3)2(9) 
end 
end 


Again, each column of submatrix A(t, 7) must be read but the only writing 
to memory involves subvectors of y. Thus, the number of vector touches 
for an m-by-n gaxpy is 


Y (2+ 30 +e) mS myn. 


ami Bx1 


This is half the number required by an identically-sized the outer product. 
Thus, if a computation can be arranged in terms of either outer products 
or gaxpys, then the former is preferable from the vector touch standpoint. 
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1.4.7 Blocking and Re-Use 


A cache is a small high-speed memory situated in between the functional 
units and main memory. See F1G.1.4.3. Cache utilization colors perfor- 
mance because it has & direct bearing upon how data flows in between the 
functional units and the lower levels of memory. 

To illustrate this we consider the computation of the matrix multiply 
update C = AB + C where A, B, C c R"*" reside in main memory?. All 
data must pass through the cache on its way to the functional units where 
the floating point computations are carried out. If the cache is small and 
n is big, then the update must be broken down into smaller parts so that 
the cache can "gracefully" process the flow of data. 

One strategy is to block the B and C matrices, 


[ Bi ptt] Bn ] C= [ Ci Ttg CN ] 
£ £ t t 
where we assume that n — £N. From the expansion 


Ca = ABa +Ca = ^ A( k) Balk, j) t Ca 
kxi 


we obtain the following computational framework: 


for a= 1:N 
Load B4 and C, into cache. 
for k = l:n 
Load A({:, k} into cache and update Ca: 
Ca = AC, k) Balk} + Ca 
end 
Store Ca in main memory. 
end 


Note that i£ M is the cache size measured in floating point words, then we 
must have 
2n 4- n X M. (1.4.1) 


Let D, be the number of floating point numbers that flow (in either direc- 
tion) between cache and main memory. Note that every entry in B is loaded 
into cache once, every entry in C is loaded into cache once and stored back 
in main memory once, and every entry in A is loaded into cache N = n/f 
times. It follows that 


3 
Py = 3n? 7. 


ATi daian willl: fce would ao acd; tithe matris wet on x disk Gad 
needed ta be brought into main memory. 
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In the interest of keeping data motion to a minimum, we choose f to be as 
large as possible subject to the constraint (1.4.1). We therefore set 


obtaining 


(We use “ss” to emphasize the approximate nature of our analysis.) If cache 
is large enough to house the entire B and C matrices with room left over 
for a column of A, then £ = n and T, = 4n?. At the other extreme, if we 
can just fit three columns in cache, then Z = 1 and Ty = nî. 

Now let us regard A = (A45) , B = (Bag), and C = (Cag) as N-by-N 
block matrices with uniform biock size £ = n/N. With this blocking the 
computation of 


N 
Cap = Y} AaBs | a — EN, B =1:N 
yal 


can be arranged as follows: 


for a = 1:N 
for § =1:N 
Load Cag into cache. 
for y= 1:N 
Load Asy and Bg into cache. 
Cap = Cag + Ag, Bg 
end 
Store C45 in main memory. 
end 
end 


In this case the main memory/cache traffic sums to 


2n? 
Pp = an? + T 


because each entry in A and B is loaded N = n/é times and each entry 
in C is loaded once and stored once. We can minimize this by choosing £ 
to be as large as possible subject to the constraint that three blocks fit in 
cache, i.e., 

30 <M 


3 
Ty = 2n? af, 
2 a3 2n" + 2n M 


Setting f= ,/M/3 gives 
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T md pee Hn 2 ——À— —. 
2 2n? + 209 / 3 2-243 [2 


The key quantity here is n?/M, the ratio of matrix size (in floating point 
words) to cache size. Às this ratio grows the we find that 


4 2 
ri 3n? 4- 3425 


AE: 
Ta 3M 


showing that the second blocking strategy is superior from the standpoint 
of data motion to and from the cache. The fundamental conclusion to be 
reached from all of this is that blocking effects data motion. 


1.4.8 Block Matrix Data Structures 


We conclude this section with a discussion about block data structures. A 
programming language that supports two-dimensional arrays must have a 
convention for storing such a structure in memory. For example, Fortran 
stores two-dimensional arrays in column major order. This means that the 
entries within a column are contiguous in memory. Thus, if 24 storage 
locations are allocated for A c IRÍ**, then in traditional store-by-column 
format the matrix entries are "lined up" in memory as depicted in Fic. 
1.4.4. In other words, if A € R™*" is stored in v(1:mn), then we identify 


FiG. 1.4.4 Store by Column (4-by-6 case) 


Alt, j) with v((j — 1)m -- i). For algorithms that access matrix data by 
column this is a good arrangement since the column entries are contiguous 
in memory. 
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jar ais G14 [ais] are. 
[an [aas 


e| | [se] | feu} | [2e] | [ase 


FIG. 1.4.5 Store-by-Blocks (4-by-6 case with 2-by-2 Blocks) 


In certain block matrix algorithms it is sometimes useful to store matri- 
ces by blocks rather than by column. Suppose, for example, that the matrix 
A above is a 2-by-3 block matrix with 2-by-2 blocks. In a store-by-column 
block scheme with store-by-column within each block, the 24 entries are 
arranged in memory as shown in Fic. 1.4.5. This data structure can be 
attractive for block algorithms because the entries within a given block are 
contiguous in memory. 


Froblems 


P1.4.1 Consider the matrix product D = ABC where A € R™*" , B e R7*" and 
C € R^**., Assume that all the matrices are stored by column and that the time required 
to execute a unit-stride saxpy operation of length k is of the form i(k) = (L--k)u where L 
ig a constant and p is the cycle time. Based on this model, when Is it more economical to 
compute D aa D = (AB)C instead of as D = A(BC)? Asaume that all matrix multipliea 
are done using the jki, (gaxpy) algorithm. 

P1.4.2 What is the total time spent in jki variant on the saxpy operations assuming 
that all the matrices are stored by column and that the time required to execute a unit- 
stride saxpy operation of length k is of the form i(k) = (L + k)u where L is a constant 
and 4 ig the cycle time? Specialize the algorithm so that it efficiently handles the case 
when A and B are n-by-n and upper triangular. Does it follow that the triangular 
implementation is six times fester as the flop count suggests? 

P1.4.3 Give an algorithm for computing C = AT BA where A and B are n-by-n and 
B is symmetric. Arrays should be accessed in unit stride fashion within all innermost 
loops. 

P1.4.4 Suppose A € R™™*" is stored by column in A.col(1:mn). Assume that m = 4) M 
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and n = fN and that we regard A as an M-by-N block matrix with f:-by-¢2 blocks. 
Qiven i, j, a, and B that satisfy 1 <i <%1,1<j7<4,1l<a<M,and1<8<N 
determine k so that A.col(k) houses the (i,j) entry of A,g. Give an algorithm that 
overwrites A.col with A stored by block a» in Figure 1.4.5. How big of & work array ia 
required? 


Notes and References for Sec. 1.4 
Two excellent expositions about vector computation are 


J.J. Dongarra, F.G. Gustavson, and A. Karp (1984). "Implementing Linear Algebrse 
Algorithms for Dense Matrices on a Vector Pipeline Machine,” SIAM Review 26, 
91-112. 

J.M. Ortega and R.G. Voigt (1985). “Solution of Partial Differential Equations on Vector 
and Parallel Computers,” SIAM Review 27, 149-240. 


A very detailed look at matrix computations in hierarchical memory systems can be 
found in 


K. Gallivan, W. Jalby, U. Meier, and A.H, Sameh (1988). “Impact of Hierarchical Mem- 
ory Systems on Linear Algebra Algorithm Design,” Int'l J. Supercomputer Applic. 
2, 12-48. 


5ee also 


W. SchGnauer (1987). Scientific Computing on Vector Computers, North Holland, Am- 
sterdam. 

R.W. Hockney and C.R. Jesshope (1988). Parallel Computers 2, Adam Hilger, Bristol 
and Philadelphia. 

where various models of vector processor performance are set forth. Papers on the prac- 

tical aspects of vector computing include 


J.J. Dongarra and A. Hinds (1979). “Unrolling Loops in Fortran," Software Practice 
and Experience 9, 219-229. 

J.J. Dongarra and S. Eisenstat (1984). “Squeezing the Most Out of an Algorithm in 
Cray Fortran," ACM Trans. Math Soft. 10, 221-230. 

B.L. Buzbee (1986) “A Strategy for Vectorization,” Paruilel Computing 3, 187-192. 

K. Galliven, W. Jalby, and U. Meier (1987). "The Use of BLAS3 in Linear Algebra on a 
Parallel Precemor with a Hierarchical Memory,” SIAM J. Sci and Stat. Comp. 8, 
1079-1084. 

J.J. Dongarra and D. Walker (1995). “Software Libraries for Linear Algebra Computas- 
tions on High Performance Computers,” SIAM Review 37, 151-180. 


Chapter 2 


Matrix Analysis 


§2.1 Basic Ideas from Linear Algebra 

§2.2 Vector Norms 

$2.3 Matrix Norms 

§2.4 Finite Precision Matrix Computations 
$2.5 Orthogonality and the SVD 

§2.6 Projections and the CS Decomposition 
82.7 The Sensitivity of Square Linear Systems 


The analysis and derivation of algorithms in the matrix computation 
area requires a facility with certain aspects of linear algebra. Some of the 
basics are reviewed in $2.1. Norms and their manipulation are covered in 
82.2 and 52.3. In 82.4 we develop a model of finite precision arithmetic and 
then use it in a typical roundoff analysis. 

The next two sections deal with orthogonality, which has a prominent 
role to play in matrix computations. The singular value decomposition 
and the CS decomposition are a pair of orthogonal reductions that provide 
critical insight into the important notions of rank and distance between 
subspaces. In $2.7 we examine how the solution of a linear system Ar = 
b changes if A and b are perturbed. The important concept of matrix 
condition is introduced. 


Before You Begin 
References that complement this chapter include Forsythe and Moler 


(1967), Stewart (1973), Stewart and Sun (1990), and Higham (1996). 
2.1 Basic Ideas from Linear Algebra 


This section is a quick review of linear algebra. Readers who wish a more 
detailed coverage should consult the references at the end of the section. 


AQ 
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2.1.1 Independence, Subspace, Basis, and Dimension 


A set of vectors {a),...,@n} in R” is linearly independent if 975, aja; = 0 
implies a(1:n) = 0. Otherwise, a nontrivial combination of the a; is zero 
and (aj,...,a4] is said to be linearly dependent . 

A subspace of IR™ is a subset that is also a vector space. Given a 
collection of vectors a1,...,a4 € IR”, the set of all linear combinations of 
these vectors is a subspace referred to as the span of (a1,..., a4]: 


span(ay,...,0,)] = (325 : Bj € R} ; 
jal 
If (a1,...,a4] is independent and b € span(ai,...,a4] , then b is a unique 
linear combination of the a;. 

If S,,..., Sk are subspaces of IR™, then their sum is the subspace defined 
by S = { a) +ag+-:-+a,: 0; € S, i = Lk). S is said to be a direct sum 
if each v € S has a unique representation v = a, +> + a, with a; € Si- 
In this case we write S = Sı 0... @ Sp. The intersection of the 5; is also 
a subspace, 5 = HAGA N Sk. 

The subset {a;,,...,@;,} is a marimal linearly independent subset of 
{ay,...,@n} if it is linearly independent and is not properly contained in any 
linearly independent subset of {@),...,¢n}. If {a;,,...,a;,} is maximal, 
then span(a;,...,a4,] = span{aj,,...,a;,} and {a;,,...,a;,} is a basis 
for span{a1,...,@n}. If S C IR" is a subspace, then it is possible to find 
independent basic vectors a1,...,a, € S such that S = span{ay,...,a%} . 
All bases for a subspace 5 have the same number of elements. This number 
is the dimension and is denoted by dim(5). 


2.1.2 Range, Null Space, and Rank 


There are two important subspaces associated with an m-by-n matrix A. 
The mnge of A is defined by 


ran(A) = (y € R” : y = Az for some z € IR"), 
and the null space of A is defined by 
null(A) = {z € R” : Az = 0). 
If A =([a),...,@,] is a column partitioning, then 
ran(A) = span(a;,...,a4] . 
The rank of a matrix A is defined by 
rank( A) = dim (ran(A)). 


It can be shown that rank(A) = rank( AT). We say that A c R™*" is rank 
deficient if rank(A) < min(m, n). If A e IR"*", then 


dim(null(.A)) + rank( A) = n. 
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2.1.3 Matrix Inverse 

The n-by-n identity matriz In is defined by the column partitioning 
In = ([€1,--+,€n] 

where e, is the Ath “canonical” vector: 


ek = (0,...,0,1, 0,...,0)7. 
ee” RR 
k-i n-k 


The canonical vectors arise frequently in matrix analysis and if their di- 
mension is ever ambiguous, we use superscripts, i.e., er ER”. 

If A and X are in IR""" and satisfy AX = I, then X is the inverse of 
A and is denoted by A^. If A`! exists, then A is said to be nonsingular. 
Otherwise, we say A is singular. 

Several matrix inverse properties have an important role to play in ma- 
trix computations, The inverse of a product is the reverse product of the 


inverses: 
(AB)! = B-'A7'. (2.1.1) 
The transpose of the inverse is the inverse of the transpose: 
(AT = (aT)! a 4T, (2.1.2) 
The identity 
B^? = A`! - BTB — Aja! (2.1.3) 


shows how the inverse changes if the matrix changes. 
The Sherman-Morrison- Woodbury formula gives à convenient expres- 
sion for the inverse of (A+UV7) where A € R°* and U and V are n-by-k: 


(A+ UVT)-' = A71 - AU VT ATU) VTA. (2.1.4) 


A rank k correction to a matrix results in a rank & correction of the inverse. 
In (2.1.4) we assume that both A and (f+ V7 A-!U) are nonsingular. 

Any of these facts can be verified by just showing that the “proposed” 
inverse does the job. For example, here is how to confirm (2.1.3): 


B(A'!-B"(B-A)A) = BA! -(B-A)A «I. 


2.1.4 The Determinant 
If A = (a) e R'*!, then its determinant is given by det(A) = a. The 
determinant of A c E *" is defined in terms of order n — 1 determinants: 


det(A) = 5 "(-1)*'ay;det(A1). 
gel 
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Here, Aj; is an (n — 1)-by-(n — 1) matrix obtained by deleting the first row 
and jth column of A. Useful properties of the determinant include 


det( AB) = det(A)det(B) A,BeR™*" 
det( AT) = det(A) AcRE'" 
det(cA) = det(A) cé€RAcR™ 
det(A) #0 & Ais nonsingular Ae R"*" 


2.1.5 Differentiation 


Suppose a is a scalar and that A(a) is an m-by-n matrix with entries a,;(c). 
If a(,(a) is a differentiable function of a for all i and j, then by A(a) we 
mean the matrix 


Àla) = Fata) = ( Fayla) ) = (åt). 


The differentiation of a parameterized matrix turns out to be a handy way 
to examine the sensitivity of various matrix problems. 


Problema 


P2.1.1 Show that if A c K'"*" has rank p, then there exists an X & R™*P and a 
Y € R"*? such that A = XYT, where rank(X) = rank(Y) = p. 

P2.1.2 Suppose A(a) € R“™”" and Bla) € KC "" are matrices whose entries are differ- 
entiable functions of the scalar a. Show 


= [A(a)B(a)] = [$a] B(a) + A(a) [= B(a)] . 
P3.1.3 Suppose A(a) € RY” haa entries that are differentiable functions of the scalar 
a. Assuming Á(a) is always nonsingular, show 
as [A] = - A7 [4t] atat. 


P2.1.4 Suppose A € E**", b € R" and that d(x) = {zT Ax — Tb. Show that the 
gradient of ¢ is given by Vé(z) = 1(AT + A)z — b. 

P2.1.5 Assume that both A and A--uvT are nonsingular where A € R*** and u,v c Kt 
Show that if z solves (A + we" )r = b, then it also solves a perturbed right hand side 
problem of the form Az = 6+ as. Give an expression for a in terms of A, u, and v. 


Notes and References for Sec. 2.1 


There ars many introductory linser algebra texts. Among them, the following are par- 
ticularly useful: 


F.R. Halmos (1958). Finite Dimensional Vector Spaces, 2nd ed., Van Nostrand-Reinhold, 
Princeton. 
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S.J. Leon (1980). Linear Algebra with Applications. Macmillan, New York. 
G. Strang (1993). Introduction to Linear Algebra, Wellesiey-Cambridge Preas, Wellesley 
MA 


D. Lay (1994). Linear Algebra and Its Applications, Addison-Wesley, Reading, MA. 

C. Meyer (1997). A Course in Applied Linear Algebra, SIAM Publications, Philadelphia, 
PA. 

More advanced treatments include Gantmacher (1959), Horn and Johnson (1985, 1991), 

and 


A.5. Householder (1964). The Theory of Matrices in Numerical Analysis, Ginn (Blais- 
dell}, Boston. 

M. Marcus and H. Minc (1964). A Survey of Matriz Theory and Matriz Inequalities, 
Allyn and Bacon, Boston. 

J.N, Franklin (1968). Matriz Theory Prentica Hall, Englewood Cliffs, NJ. 

R Bellman (1970). Introduction to Matriz Analysis, Second Edition, McGraw-Hill, New 
York, 

P. Lancaster and M. Tismenetsky (1985). The Theory of Matrices, Second Edition, 
Academic Pres, New York. 

J.M. Ortega (1987). Matriz Theory: A Second Course, Plenum Press, New York. 


2.2 Vector Norms 


Norms serve the same purpose on vector spaces that absolute value does 
on the real line: they furnish a measure of distance. More precisely, IR" 
together with a norm on R” defines a metric space. Therefore, we have the 
familiar notions of neighborhood, open sets, convergence, and continuity 
when working with vectors and vector-valued functions. 


2.2.1 Definitions 


A vector norm on IR" is a function f:R” — R that satisfies the following 
properties: 


f(z) > 0 | zceR^, (f(z) =0iff z=0) 
f(z+y) < fíz)tf(y) zyeR" 
fiar) = jalf{z) acR zer 


We denote such a function with a double bar notation: f(z} = || x |}. Sub- 
scripts on the double bar are used to distinguish between various norms. 
A useful class of vector norms are the p-norms defined by 
|zll,—(nP- md pl. (2.2.1) 
Of these the 1, 2, and oo norms are the most important: 
Wt, = habt + (zal 
ll = lle (Imi? + +--+ Iz, I)! - (z72)* 


lx ll. max |z;| 
I<itn 
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A unit vector with respect to the norm || - || is a vector x that satisfies 
zl 1. 


2.2.2 Some Vector Norm Properties 
A classic result concerning p-norma is the Holder inequality: 


1 1 
iz^y| € lil vil, "dri (2.2.2) 
A very important special case of this is the Cauchy-Schwartz inequality. 
Iz y| € [lz llall y liz- (2.2.3) 
All norms on R” are equivalent , i.e., if |] - ||, and || - || are norms on 


R”, then there exist positive constants, c; and cz such that 


alti, < izha Sell zl. (2.2.4) 


for all r € IR^. For example, if z € R”, then 


| z lla < Iz l < yn || = lla (2.2.5) 
lzl € izl € valz io (2.2.6) 
Iz € Ith < nlzi (2.2.7) 


2.2.3 Absolute and Relative Error 


Suppose 2 € IR" is an approximation to z € IR". For a given vector norm 
+ || we say that 

tab. = |-| 
is the absolute error in z. If z Æ 0, then 


prescribes the relative error in z. Relative error in the oo-norm can be 
translated into a statement about the number of correct significant digits 
in X. In particular, if . 
ll II—I Hes z 107?, 
| x Hoo 
then the largest component of 2 has approximately p correct significant 
digits. 


Example 2.2.1 IFz = (1.234 .05674)T and 2 = (1.235 .05128)7, then | 2 — z [iL / ll zi. 
zs .0043 a 10-7. Note than z; haz about three significant digits that are correct while 
only one significant digit in £2 is correct. 
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2.2.4 Convergence 
We say that a sequence (z/*) of n-vectors converges to x if 


lim || 2) —z || =0. 
k= 


Note that because of (2.2.4), convergence in the a-norm implies convergence 
in the -norm and vice versa. 


Problems 


P2.2.1 Show that if z € R^, than lim, o; |i z |, = i x Il... 

P2.2.2 Prove the Cauchy-Schwarts inequality (2.2.3) by considering the inequality 
Q € (ax + by)T (az + by) for suitable scalars a and b . 

P2.2.3 Verify that || - il}, || * iz, and || - |, are vector norms. 

P2.2.4 Verify (2.2.5)-(2.2.7). When is equality achieved in each result? 

P2.2.5 Show that in R^, zit) — » if and only if ze — f, lor k= lin. 

P2.2.8 Show that any vector norm on R" is uniformly continuous by verifying the 
inequality || zl - Hy H S ilz- y il- 

P2.2.7 Let ||- || be a vector norm on A™ and asume Ag R™*" . Show that if 
rank(A) = n, then |j z [a = || Az Jj is a vector norm on R”. 

P2.2.8 Let z and y be in R” and define yx R. — R by $a) = | z — ay İla. Show that 
ý is minimized when a = z7 y/y7 y. 

P2.2.9 (a) Verify that || z llo = (|zil?-.-- +|znl?)? is a vector norm on (C^. (b) Show 
that if zc C" then [iz |l, < c {|} Re(z) ll, + il Im(z) lp). (c) Find a constant cn such 
that en (|| Re(z) lla + |] Em(z) l3) € l| z ila for all z e C". 

P2.2.10 Prove or disprove: 


l4- n 
PER = ulituttes < sy. 


Notes and References for Sec. 2.2 


Although a vector norm is "just" a generalization of the absolute value concept, there 
are soma noteworthy subtleties: 


J.D. Pryce (1984). “A New Measure of Relative Error for Vectors," SIAM J. Num. 
Anal 21, 202-21. 


2.3 Matrix Norms 


The analysis of matrix algorithms frequently requires use of matrix norms. 
For example, the quality of a linear system solver may be poor if the ma- 
trix of coefficients is “nearly singuler.” To quantify the notion of near- 
singularity we need a measure of distance on the space of matrices. Matrix 
norms provide that measure. 
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2.3.1 Definitions 


Since R™™*" is isomorphic to R™", the definition of a matrix norm should be 
equivalent to the definition of a vector norm. In particular, f R" — R 
is a matrix norm if the following three properties hold: 


f(A} > 0 AeR™*,  (f(A) =0 if A=0) 
f(A+B) < f(A)+f(B) A,BeR™, 
f(aA) = |alf(A) aéR,Ae R7", 


As with vector norms, we use a double bar notation with subscripts to 
designate matrix norms, i.e., || A || = f(A). 
The most frequently used matrix norms in numerical linear algebra are 


the Frobenius norm, 
| Ale = (EZ la, 5/2 (2.3.1) 
imi j=l 


a, (2.3.2) 


and the p-norms 


Note that the matrix p-norms are defined in terms of the vector p-norms 
that we discussed in the previous section. The verification that (2.3.1) and 
(2.3.2) are matrix norms is left as an exercise. It is clear that || A lp is 
the p-norm of the largest vector obtained by applying A to a unit p-norm 


vector: 
T 
a(r) 


It is important to understand that (2.3.1) and (2.3.2) define families 
of norms—the 2-norm on R°™? is a different function from the 2-norm on 
FÓ*5. Thus, the easily verified inequality 


max |Azr|,. 


Ed PLE 


All = su - 
lA, sup 


| ABl, <A Wel, | AeR'""",Bex (2.3.3) 


is really an observation about the relationship between three different norms. 
Formally, we say that norms f,, fa, and fy on R™** RX", and RP”? are 
mutually consistent if for all A c R™*" and B € R°“* we have f,(AB) < 
fa( A) fa(B). 

Not all matrix norms satisfy the submultiplicative property 


(AB < WANN By. (2.3.4) 
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For example, if {| A |], = max |a;;j and 


1 1 
a=B=|3 1]. 


then || AB ||, > Alal B Ila. For the most part we work with norms that 
satisfy (2.3.4). 

The p-norms have the important property that for every A € Et" "^" and 
z ER edi | Az], < |Alllz|l,. More generally, for any vector 
norm || - ||, on R” and || - |; on R™ we have | Az lg S ll All, sl? lla 
where || P f a,g I5 a matrix norm defined by 


| Az il 


Á = sup 2.3.5 

| laa M | m la ( ) 

We say that || - ||, 4 is subordinate to the vector norms || - ||, and || - || 8: 

Since the set {z € Re | z |, = 1} is compact and || - ||, is continuous, it 
follows that 

lA llag = rar | Az] = || Ax* lg (2.3.6) 


for some z* € IR" having unit a-norm. 


2.3.2 Some Matrix Norm Properties 


The Frobenius and p-norms (especially p = 1, 2, oo) satisfy certain inequal- 
ities that are frequently used in the analysis of matrix computations. For 
A € R™*" we have 


l4]: € Alle S Vall All (2.3.7) 
max lail < Il A|a S vmn max jay] (2.3.8) 
iJ 
lAl = max > les (2.3.9) 
lAl = max Lios (2.3.10) 
TaN Allo < [Alla < Val Alls (2.3.12) 
1 


zm HAM < [Alle € vni Alh (2.3.12) 
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f Ae R"*",1 «£4 Sig € m, and 1 € jj < ja Sn, then 
| A(5:2,2132) lp <All, (2.3.13) 


The proofs of these relations are not hard and are left as exercises. 
A sequence {A‘*)} e R™*" converges if limy_.. || A“) -Al = 0. 
Choice of norm is irrelevant since all norms on R™*™ are equivalent. 


2.3.3 The Matrix 2-Norm 


A nice feature of the matrix 1-norm and the matrix oo-norm is that they 
are easily computed from (2.3.9) and (2.3.10). A characterization of the 
2-norm is considerably more complicated. 


Theorem 2.3.1 If A c R™*", then there exists a unit 2-norm n-vector z 
such that AT Az = u?z where p = || Alla. 


Proof. Suppose z € R” is a unit vector such that || Az |j = || A le. Since 
z maximizes the function 


(x) = LH Az _ ist AaTAr 
MeO zl 2 ats 
it follows that it satisfies Vg(z) = 0 where Vg is the gradient of g. But a 
tedious differentiation shows that for i = lin 


8giz) _ (27 z) S (AT Aya; - a aas / (272). 


Ox j=l 


In vector notation this says AT Az = (zT AT Az)z. The theorem follows by 
setting u = || Az 43. O 


The theorem implies that || A[|3 is a zero of the polynomial p(4) = 
det( AT A — XJ}. In particular, the 2-norm of A is the square root of the 
largest eigenvalue of A? A. We have much more to say about eigenvalues in 
Chapters 7 and 8. For now, we merely observe that 2-norm computation 
is iterative and decidedly more complicated than the computation of the 
matrix 1-norm or co-norm. Fortunately, if the object is to obtain an order- 
of-magnitude estimate of || A lz, then (2.3.7), (2.3.11), or (2.3.12) can be 

Ás another example of "norm analysis," here is a handy result for 2- 
norm eetimation. 


Corollary 2.3.2 If Ac R"™™", then | A lle € V lA. A nu - 


Proof. If z # 0 is such that AT Az = u?z with u = || A |[2, then p" || z f], = 
HA? Az], € PATINA zm AMI ATI zl. 2 
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2.3.4  Perturbations and the Inverse 


We frequently use norms to quantify the effect of perturbations or to prove 
that a sequence of matrices converges to a specified limit. As an illustration 
of these norm applications, let us quantify the change in A`! as a function 
of change in A. 

Lemma 2.3.3 If Pc F?*" and | Fl, < 1, then J — F is nonsingular 


and E 
(I-F) = Ý F* 
kæ 
with i 
HER = 
le an 
Proof. Suppose 7 — F is singular. It follows that (J — F)z = 0 for some 
nonzero z. But then || z ||, = || Fz ||, implies | F ||, 2 1, a contradiction. 


Thus, 7 — F is nonsingular. To obtain an expression "for its inverse consider 


the identity 
N 
(Y P) (-F) = 1- F^., 


kai) 
Since || F ||, < 1 it follows that jim F* = 0 because | F*], < || F I$ 
oo 


lp: 
| (un s Er)e- F) - I. 


ets 


It follows that (J — F) ! = jim im 50 Fe From this it is easy to show that 


exl 
I-F)! = 
IU -EY lp sla lp =] IFT 
Note that || (1 - F) ^ - I, < IFN0 -IF ilp) aa a consequence 
of the lemma. Thus, if e € « 1, then Ole) perturbations in I induce O(e} 


perturbations in the inverse. We next extend this result to general matrices. 
Theorem 2.3.4 If A is nonsingular andr = || A! El, < 1, then A+ E 
is nonsingular and || (A+ E)! — A! | < IEN, I A7! lz/ — n). 
Proof. Since A is nonsingular A+ E = AÇI — F) where F = —A^!E. 
Since || Fj, = r < 1 it follows from Lemma 2.3.3 that I — F is nonsingular 
and | (L— F)7* f, < 1/(1—r). Now (A+ E)! = (I - F)*! A7! and so 
HA* I, 


A “ly < 
IAD s S— 
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Equation (2.1.3) says that (A+ E)~' — A7! = —A7'E(A+ E)7! and so 
by taking norms we find 
I (4+ E)! -A~ I, 


IA 


i ATH ipl Æ lp A+ E) I, 


—1 n? 
IA lE lp 3 
l-r 


Problems 


P2.3.1 Show || AB], < HA, Ba, where 1 ops co. 

P2.3.2 Let B be any submatrix of A. Show that || Bil, € {| Ail,- 

P2.3.3 Show that if D = ding(j1,...,4%) € KC" *" with k = min(m,n), then | D fj, 
= max |l- 

P2.3.4 Verify (2.3.7) and (2.3.8). 

P2.3.5 Verify (2.3.9) and (2.3.10). 

P2.3.6 Verify (2.3.11) and (2.3.12). 

P2.3.7 Verify (2.3.13). 

P2.3.8 Show that if 0 #4 s c R” and E c R*", then 
2 


E (1-55) 

F 
P2.3.9 Suppose u € R™ and v € R^. Show that if E = uv? then | Elp = || E l2 = 
|| 2 lio v lig and that || EG, S ellie lh 
P2.3.10 Suppose A c R™*",y € R™, and 0 4 s € R”. Show that E = (y— As)sT /sT s 
has the smallest 2-norm of all m-by-n matrices £ that satisfy (A + E)s = y. 


| Es [rà 
ss ` 


= | £13 - 


Notes and References for Sec. 2.3 
For deeper imues concerning matrix/vector norms, see 


F.L. Bauer and C.T. Fike (1960). “Norms and Exclusion Theorems,” Numer. Math. £, 
137-44. 

L. Mirsky (1960). “Symmetric Gauge Functions and Unitarily Invariant Norma," Quart. 
J. Math. 11, 50-59. 

A.S. Householder (1964). The Theorg of Matrices m Numerical Analysis , Dover Pub- 
lications, New York. 

N.J. Higham (1992). "Estimating the Matrix p-Norm," Numer. Math 62, 539—556. 


2.4 Finite Precision Matrix Computations 


In part, rounding errors are what makes the matrix computation area so 
nontrivial and interesting. In this section we set up a model of floating point 
arithmetic and then use it to develop error bounds for floating point dot 
products, saxpy's, matrix-vector products and matrix-matrix products. For 
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a more comprehensive treatment than what we offer, see Higham (1996) or 
Wilkinson (1965). The coverage in Forsythe and Moler (1967) and Stewart 
(1973) is also excellent. 


2.4.1 The Floating Point Numbers 


When calculations are performed on a computer, each arithmetic opera- 
tion is generally affected by roundoff error. This error arises because the 
machine hardware can only represent a subset of the real numbers. We 
denote this subset by F and refer to its elements as floating point numbers. 
Following conventions set forth in Forsythe, Malcolm, and Moler (1977, pp. 
10-29), the floating point number system on a particular computer i3 char- 
acterized by four integers: the base B, the precision t, and the erponent 
range [L, U]. In particular, F consists of all numbers f of the form 


f = +didi...da x P" O<d <p, di£ü, Esesu 


together with zero. Notice that for a nonzero f € F we have m < |f| < M 
where 


m= and M=6"(1- 87). (2.4.1) 


As an example, if 8 = 2, t = 3, L = 0, and U = 2, then the non-negative 
elements of F are represented by hash marks on the axis displayed in Fic. 
2.4.1. Notice that the floating point numbers are not equally spaced. A 


FIGURE 2.4.1 Sample Floating Point Number System 


typical value for (8,t, L, U) might be (2, 56, -64, 64). 


2.4.2 A Model of Floating Point Arithmetic 


To make general pronouncements about the effect of rounding errors on a 
given algorithm, it is necessary to have a model of computer arithmetic on 
F. To this end define the set G by 


G={zreER:m¢ lz] € M )u(0) (2.4.2) 
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and the operator fi: G —^ F by 


f(z) = nearest c € F to z with ties handled 
| by rounding away from zero. 


The fI operator can be shown to satisfy 
f!(z) = z(1- «) l| Su (2.4.3) 
where u is the unit roundoff defined by 


u = zfil-t, (2.4.4) 


Let a and b be any two floating point numbers and let “op” denote any 
of the four arithmetic operations +, —, x, +. Ifa opb € G, then in our 
model of floating point artthmetic we assume that the computed version of 
(a op b) is given by fl(a op b). It follows that fl(a op b) = (a op b)(1 + €) 
with |e| X u. Thus, 


[fifa op b) — (a op b)l 


< bx 2.4. 
I opo u a op b X (2.4.5) 


showing that there is small relative error associated with individual arith- 
metic operations!. It is important to realize, however, that this is not 


necessarily the case when a sequence of operations is involved. 


Example 2.4.1 lf = 10, t = 3 floating point arithmetic is used, then it can be shown 
that fi(fi(107* + 1) — 1] = 0 implying a relative error of 1. On the other hand the 
exact answer is given by f!(/1(107* + fl(1— 1)| = 1074. Floating point arithmetic is 
not always associative. 


If a op b € G, then an arithmetic ezception occurs. Overflow and 
underflow results whenever |a op b] > M or 0 < [a op b| < m respectively. 
The handling of these and other exceptions is hardware/system dependent. 


2.4.3 Cancellation 


Another important aspect of finite precision arithmetic is the phenomenon 
of catastrophic cancellation. Roughly speaking, this term refers to the ex- 
treme loss of correct significant digits when small numbers are additively 
computed from large numbers. À weil-known example taken from Forsythe, 
Malcolm and Moler (1977, pp. 14-16) is the computation of e^* via Tay- 
lor series with a > 0. The roundoff error associated with this method is 

l'There are important examples of machines whose additive floating point operations 


satisfy fi(a +b) = (1 +e; }a + (1 + e3)5 where |e1l,|e3] < u In such an environment, 
the inequality |fi(a + b) — (a + b)| < ula + & need not hold. 
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approximately u times the largest partial sum. For large a, this error can 
actually be larger than the exact exponential and there will be no correct 
digits in the answer no matter how many terms in the series are summed. 
On the other hand, if enough terms in the Taylor series for e? are added and 
the result reciprocated, then an estimate of e~* to full precision is attained. 


2.4.4 The Absolute Value Notation 


Before we proceed with the roundoff analysis of some basic matrix calcu- 
lations, we acquire some useful notation. Suppose A € IR"*" and that we 
wish to quantify the errors associated with its floating point representation. 
Denoting the stored version of A by fi(A)}, we see that 


[fi(A); = fi(a4) = afl te) lez] <u (2.4.6) 


for all i and j. A better wey to say the same thing results if we adopt two 
conventions. If A and B are in R™*", then 


B = |A] => bj = lagi, i= lim, j= lmn 
B<A > btzag,i-km,j-Lm. 
With this notation we see that (2.4.6) has the form 
Ifi(A) - A} S ulAl. 


À relation such as this can be easily turned into a norm inequality, e.g., 
| fi(A) - Al, € ull A||,- However, when quantifying the rounding errors 
in a matrix manipulation, the absolute value notation can be a lot more 
informative because it provides a comment on each (i, j) entry. 


2.4.5 Roundoff in Dot Products 


We begin our study of finite precision matrix computations by considering 
the rounding errors that result in the standard dot product algorithm: 


s=Q 
for k= i:n 

8— S ZkYk (2.4.7) 
end 


Here, z and y are n-by-1 floating point vectors. 

In trying to quantify the rounding errors in this algorithm, we are 
immediately confronted with a notational problem: the distinction be- 
tween computed and exact quantities. When the underlying computations 
are clear, we shall use the fi(-) operator to signify computed quantities. 
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Thus, fi(z™y) denotes the computed output of (2.4.7). Let us bound 


Ifl(zTy) - z^ yl. If : 
= fi (Zan) ; 
k=l 


then s; = zigi(1 + 61) with |4,] € u and for p = 2:n 


Sp = fl{sp-1+ fl(zpyp)) 
(sp1 + zpup(1+5))) (+e) Mphlepl <u. (2-4.8) 


A little algebra shows that 


f(x"y) = s. = S s + Yk) 


kml 


where E 
(45) = (1+ &) [[Q €) 
jack 
with the convention that «1 = 0. Thus, 
(zTy) -xTy < — S irevellrel- (2.4.9) 


kml 


To proceed further, we must bound the quantities |+,| in terms of u. The 
following result is useful for this purpose. 


Tt 
Lemma 2.4.1 If {1+ = [[0 +a) where |ar| Su and nu < .01, then 
kmi 
la| € 1.01nu. 


Proof. See Higham (1996, p. 75). O 
Applying this result to (2.4.9) under the “reasonable” assumption nu < .01 
gives 

|fi(zTy) - zTy| € 101nulz|? |y|. (2.4.10) 
Notice that if [x7 yj € |z|T |y, then the relative error in fl(z7y) may not 
be small. 


2.4.0 Alternative Ways to Quantify Roundoff Error 


An easier but less rigorous way of bounding a in Lemma 2.4.1 nudo 
|a| € nu + O(u?). With this convention we have 


L/(zTy) - zTy| € nujz| |y] + O(u?). (2.4.11) 
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Other ways of expressing the same result include 
|ft(zTy) - zT y € é(n)ulziT ly! (2.4.12) 


and 
Ifl(zTy) — zTy| € enulzi yl, (2.4.13) 


where in (2.4.12) ¢(n) ia a “modest” function of n and in (2.4.13) c is a 
constant of order unity. 

We shall not express a preference for any of the error bounding styles 
shown in (2.4.10)-(2.4.13). This spares us the necessity of translating the 
roundoff results that appear in the literature into a fixed format. Moreover, 
paying overly close attention to the details of an error bound is inconsistent 
with the “philosophy” of roundoff analysis. As Wilkinson (1971, p. 567) 


Says, 


There is still a tendency to attach too much importance to the 
precise error bounds obtained by an 4 priori error analysis. In 
my opinion, the bound itself is usually the least important part 
of it. The main object of such an analysis is to expose the 
potential instabilities, if any, of an algorithm so that hopefully 
from the insight thus obtained one might be led to improved al- 
gorithms, Usually the bound itself is weaker than it might have 
been because of the necessity of restricting the mass of detail 
to a reasonable level and because of the limitations imposed by 
expressing the errora in terms of matrix norms. À priori bounds 
are not, in general, quantities that should be used in practice. 
Practica! error bounds should usually be determined by some 
form of à posteriori error analysis, since this takes full advan- 
tage of the statistical distribution of rounding errors and of any 
special features, suclr as sparseness, in the matrix. 


It is important to keep these perspectives in mind. 


2.4.7 Dot Product Accumulation 


Some computers have provision for accumulating dot products in double 
precision. This means that if x and y are floating point vectors with length 
t mantissas, then the running sum 3 in (2.4.7) is built up in a register with 
a 2t digit mantissa. Since the multiplication of two t-digit floating point 
numbers can be stored exactly in a double precision variable, it is only 
when 3 is written to single precision memory that any roundoff occurs. In 
this situation one can usually assert that a computed dot product has good 
relative error, i.e., fl(zTy) = zT y(1-4- 6) where jô] = u. Thus, the ability 
to accumulate dot products is very appealing. 
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2.4.8 Roundoff in Other Basic Matrix Computations 


It is easy to show that if A and B are floating point matrices and a is a 
Boating point number, then 


fi(aA) 2 aA E |E| € ujaA| (2.4.14) 
and 
fUA+ BY =(AT BE JE| € ulA + BI. (2.4.15) 


As a consequence of these two results, it is easy to verify that computed 
saxpy's and outer product updates satisfy 


flaz-y)saz*y*z  |z Xu(2]azb |y) - O(u2).— (2.4.16) 


fC tw )=C+u™+E  |E| S u(JCI - 2177]|) 4 O(u?). (2.4.17) 


Using (2.4.10) it is easy to show that a dot product based multiplication of 
two floating point matrices A and B satisfies 


fUAB)=AB+E |El < nu|AJ]|B| + O(u?). (2.4.18) 


The same result applies if a gaxpy or outer product based procedure is used. 
Notice that matrix multiplication does not necessarily give small relative 
error since |AB| may be much smaller than | A||B], eg., 


Lo e]l - o|=| 0 o]: 


It is eagy to obtain norm bounds from the roundoff results developed thus 
far. If we look at the 1-norm error in floating point matrix multiplication, 
then it is easy to show from (2.4.18) that 

| /(AB) - ABI, S nul A lill B Il, + Ou’). (2.4.19) 


2.4.9 Forward and Backward Error Analyses 


Each roundoff bound given above is the consequence of a ferward error 
analysis. An alternative style of characterizing the roundoff errors in an 
algorithm is accomplished through a technique known as backward error 
analysis. Here, the rounding errors are related to the data of the problem 
rather than to its solution. By way of illustration, consider the n = 2 
version of triangular matrix multiplication. It can be shown that: 


a11011(1 t &) (@115:9(1 + £2) + Q49592(1 + €3))(1 + €4) 
fl(AB) = 
G &22022(1 + €&) 
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where |e;| < u, for i = 1:5. However, if we define 
" | Gil 412{1 + e3)(1 + €4) | 
A= 


0 a22(1 -+ és) 


and 
: | bun) PO tela) | 
B = D 


0 baa 
then it is easily verified that fi(AB) = AB. Moreover, 
A=A+E IEI € 2uJA| + O(u?) 
B=B+F  |F|z2u|B|-* O(u?). 


In other words, the computed product is the exact product of slightly per- 
turbed A and B. 


2.4.10 Error in Strassen Multiplication 


In §1.3.8 we outlined an unconventional] matrix multiplication procedure 
due to Strassen (1969). It is instructive to compare the effect of roundoff 
in this method with the effect of roundoff in any of the conventional matrix 
multiplication methods of §1.1. 

It can be shown that the Strassen approach (Algorithm 1.3.1) produces 
a C = fl(AB) that satisfies an inequality of the form (2.4.19). This is 
perfect!y satisfactory in many applications. However, the C that Strassen's 
method produces does not always satisfy an inequality of the form (2.4.18). 
To see this, suppose 


99 0010 
A - B - | wo 99 | 


and that we execute Algorithm 1.3.1 using 2-digit floating point arithmetic. 
Among other things, the following quantities are computed: 


P, = fl(.99(.001 — .99)) = —.98 
B, = fi((.99 + .001).99) = .98 
ĉa = fl(P-5)-090 


Now in exact arithmetic c12 = 2(.001)(.99) = .00198 and thus Algorithm 1.3.1 
produces a éj4 with no correct significant digits. The Strassen approach gets 
into trouble in this example because small off-diagonal entries are combined 
with large diagonal entries. Note that in conventional matrix multiplication 
neither by4 and byg nor a4, and ajz are summed. Thus the contribution of 
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the small off-diagonal elements is not lost. Indeed, for the above A and B 
a conventional matrix multiply gives ¢)7 = .0020. 

Failure to produce a componentwise accurate C can be a serious short- 
coming in some applications. For example, in Markov processes the dij, 
bij, and cj; are transition probabilities and are therefore nonnegative. It 
may be critical to compute cj; accurately if it reflects a particularly im- 
portant probability in the modeled phenomena. Note that if A > 0 and 
B > 0, then conventional matrix multiplication produces a product C that 
has small componentwise relative error: 


IC —C| € nulAl |B| + O(u?) = nujCl| + O(u?). 


This follows from (2.4.18). Because we cannot say the same for the Strassen 
approach, we conclude that Algorithm 1.3.1 is not attractive for certain 
nonnegative matrix multiplication problems 1f relatively accurate &jj are 
required. 

Extrapolating from this discussion we reach two fairly obvious but im- 
portant conclusions: 


e Different methods for computing the same quantity can produce sub- 
stantially different results. 


e Whether or not an algorithm produces satisfactory results depends 
upon the type of problem solved and the goals of the user. 


These observations are clarified in subsequent chapters and are intimately 
related to the concepts of algorithm atability and problem condition. 


Problems 


P2.4.1 Sbow that if (2.4.7) is applied with y = x, then fl(zTz) = zTz(1-- a) where 
Jal € nu + O(u?). 

P2.4.2 Prove (2.4.3). 

P2.4.$ Show that if E € R^*" with m > n, then į |El a < vnl E 2. This result is 
useful when deriving norm bounds from absolute value bounds. 

P2.4.4 Assume the existence of a square root function satisfying fl(/z) = /z(1 + «) 
with |e} < u. Give an algorithm for computing l x [3 and bound the rounding errors. 
P2.45 Suppose A and B are n-by-n upper triangular floating point matrices. If C = 
fJl( AB) is computed using one of the conventional $1.1 algorithms, does it follow that 
C = AB where A and B are close to A and B? 

P2.4.6 Suppose A and B are n-by-n floating point matrices and that A is nonsingular 
with | |A7!||A| foo = r. Show that if C = fl(AB) is obtained using any of the 
algorithms in $1.1, then there exists a Ê so Ó = AB and J Ê- BI, < nuri Bio + 
O(u?). 

P2.4.7 Prove (2.4.18). 
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Notes and Raferences for Sec. 2.4 
For a general introduction to the effecta of roundoff error, we recommend 


J.H. Wilkinson (1963). Rounding Errors in Algebraic Processes, Prentice-Hall, Engle- 
wood Cliffs, NJ. 

Z.H. Wilkinson (1971). “Modern Error Analysis" SIAM Review 13, 548-68. 

D. Kahaner, C.B. Moler, and S. Nash (1988). Numerical Methods and Software, Prentice- 
Hall, Englewood Cliffs, NJ. 

F. Chaitin-Chatelin and V. Frayseé (1996). Lectures on Finite Precision Computations, 
SIAM Publications, Philedeiphia. 


More recent developments in error analysis involve interval analysis, the building of stè 
tistical models of roundoff error, and the automating of the analysis itself: 


T.E. Hull and J.R. Swensen (1966). “Tests of Probabilistic Models for Propagation of 
Roundoff Errors,” Comm. ACM. 9, 108-13. 

J. Larson and A. Sameh (1978). "Efficient Calculation of the Effecta of Roundoff Errors," 
ACM Trans. Math. Soft. 4, 228-36. 

W. Miller and D. Spooner (1978). "Software for Roundoff Analyms, IL" ACM Trans. 
Math. Soft. 4, 369-90. 

J.M. Yohe (1979). “Software for Interval Arithmetic: A Reasonable Portable Package,” 
ACM Trans. Math. Soft. 5, 50-63. 


Anyone engaged in serious software development needs a thorough understanding of 
floating point arithmetic. A good way to begin acquiring knowledge in this direction is 
to read about the TEEE floating point standard in 


D. Goldberg (1991). “What Every Computer Scientist Should Know About Floating 
Point Arithmetic,” ACM Surveys 23, 5-48. 


See also 


RP. Brent (1978). “A Fortran Multiple Precision Arithmetic Package,” ACM Trana. 
Math. Soft. 4, 57-70. 

R.P. Brent (1978). “Algorithm 524 MP, a Fortran Multiple Precision Arithmetic Pack- 
age,” ACM Trans. Math. Soft. 4, 71-81. 

J.W. Demmel (1984). “Underflow and the Reliability of Numerical Software,” SIAM J. 
Sci and Stat. Comp. 5, 887—919. 

U.W. Kulisch and W.L. Miranker (1986). "The Arithmetic of the Digital Computer," 
SIAM Review 28, 1-40. 

W.l. Cody (1988). "ALGORITHM 665 MACHAR: A Subroutine to Dynamically De- 
termine Machine Parameters,” ACM Trans. Math. Soft. 14, 303-311. 

D.H. Bailey, H.D. Simon, J. T. Barton, M.J. Fouts (1989). “Floating Point Arithmetic 
in Future Supercomputers,” Int'l J. Supercomputing Appi. 3, 86-90. 

D.H. Bailey (1993). “Algorithm 719: Multiprecision Translation and Execution of FOR- 
TRAN Programs,” ACM Trans. Math. Soft. 19, 288-319. 


The subtleties associated with the development of high-quality software, even for “sim- 
ple" problema, are immense. A good example is the design of a subroutines to compute 
2-norma 
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2.5 Orthogonality and the SVD 


Orthogonality has a very prominent role to play in matrix computations. 
After establishing a few definitions we prove the extremely useful singular 
value decomposition (SVD). Among other things, the SVD enabies us to 
intelligently handle the matrix rank problem. The concept of rank, though 
perfectly clear in the exact arithmetic context, is tricky in the presence of 
roundoff error and fuzzy data. With the SVD we can introduce the practical 
notion of numerical rank. 


2.5.1 Orthogonality 


A set af vectors (z1,...,z,] in R™ is orthogonal if x72; = 0 whenever 
i # j and orthonormal if zlz, = éj. Intuitively, orthogonal vectors are 
maximally independent for they point in totally different directions. 

A collection of subspaces 5,,...,5, in IR™ is mutually orthogonal if 
zTy = 0 whenever z € S; and y € 5; for i # j. The orthogonal complement 
of a subspace S C R™ is defined by 


S+ = [ye R” : yz =0 for all z € 5) 


and it is not hard to show that ran(A)~ = nuli( AT). The vectors t4,... , vx 
form an orthonormal basis for a subspace S C R” if they are orthonormal 
and span 5S. 

A matrix Q € R™*™ is said to be orthogonal if QTQ = I. LEQ = 
[41,---. 4m ] is orthogonal, then the q; form an orthonormal! basis for R". 
It is always possible to extend such a basis to a full orthonormal basis 
[m,. a Um} for R™: 


Theorem 2.5.1 if V, € R°”“" Aas orthonormal columns, then there exists 
Va € RX") such that 
v -[ 9] 


is orthogonal. Note that ran(Vi)^ = ran(V3). 


Proof. This is & standard result from introductory linear algebra. It is 
also a corollary of the QR factorization that we present in $5.2. 0 
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2.5.2 Norms and Orthogonal Transformations 


The 2-norm is invariant under orthogonal transformation, for if QTQ = I, 
then | Qz|] = zTQTQx = zz = ||zi|j. The matrix 2-norm and 
the Frobenius norm are also invariant with respect to orthogonal transfor- 
mations. In particular, it is eagy to show that for all orthogonal Q and Z 
of appropriate dimensions we have 


l QAZ iie = 1 Alle (2.5.1) 


and 


| QAZ lla = | A liz. (2.5.2) 


2.5.3 The Singular Value Decomposition 


The theory of norma developed in the previous two sections can be used to 
prove the extremely useful singular value decomposition. 


Theorem 2.5.2 (Singular Value Decomposition (SVD)) If A is a real 
m-by-n matriz, then there erist orthogonal matrices 


U = |u... Uum] E R?"" and V-[v,...,v]c R^*" 
such that 
UTAV =diag(o,,...,0)) c R™*" p= min{m, n} 
where 01 209 È... > 0p 2 0. 


Proof. Let z c R” and y € R™ be unit 2-norm vectors that satisfy Ar = 
oy with ¢ = || A||;. From Theorem 2.5.1 there exist V, € IR X(n7!) and 
Uz e R^*(773) so V = {z V] e R™" and U = [y Uz] € E**"^ are 
orthogonal. It is not hard to show that UT AV has the following structure: 


Since 
2 


(Di 


we have || Ai f3 > (0°+wTw). But o? = | A|} = || Ax lj} , and so we 
must have w = 0. An obvious induction argument completes the proof of 
the theorem. O 


> (+w w) 


The g; are the singular values of A and the vectors u; and v; are the 
ith left singular vector and the ith right singular vector respectively. It 
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is easy to verify by comparing columns in the equations AV = UE and 
ATU = VXT that 


Av; 
Alu; 


Jiu 
TiVi 


} i = l:min(m, n) 


It is convenient to have the following notation for designating singular val- 
ues: 


a,{A) the ith largest singular value of A, 
Omazr{A) the largest singular value of A, 
Omin(À) = the smallest singular value of A. 


The singular values of a matrix A are precisely the lengths of the semi-axes 
of the hyperellipsoid E defined by E = ( Az: || z |l2 =1 }. 


Example 2.5.1 


[96 172]. r [6 -8][3 olfa #617 
A= | 228 96 | =UBV “{3 Pale alll ee 3E 


The SVD reveals a great deal about the structure of à matrix. If the 
SVD of A is given by Theorem 2.5.2, and we define r by 


Oy 2 2 Oe > Orpi FH Oy = G, 
then 
rank(A) = r (2.5.3) 
nul(A) = span(tu.,:,..., Un} (2.5.4) 
- ran(A) = span{u,..., ur}, (2.5.5) 


and we have the SVD expansion 
z 
A= p» ou? . (2.5.6) 
i=l 


Various 2-norm and Frobenius norm properties have connections to the 
SVD. If A c R™*", then 


l| A it = ol eoi p=min{m,n} (2.5.7) 
I4la = a (2.5.8) 
min Aria = On (m >n). (2.5.9) 


zxo ||z l2 
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2.5.4 The Thin SVD 
If A= UEVT c IR?*" is the SVD of A and m > n, then 


A= UE VT 


where 
U, = U(:, lin) = [w ,..., un] E R” 


and 


E, = Lim, lin) = diag(o,,...,04,)€ R". 


We refer to this much-used, trimmed down version of the SVD as the thin 
SVD. 


2.5.5 Rank Deficiency and the SVD 


One of the most valuable aspects of the SYD is that it enables us to deal 
sensibly with the concept of matrix rank. Numerous theorems in linear 
algebra have the form “if such-and-such a matrix has full rank, then such- 
and-such a property holds.” While neat and aesthetic, results of this flavor 
do not help us address the numerical difficuities frequentiy encountered in 
situations where near rank deficiency prevails. Rounding errors and fuzzy 
data make rank determination a nontrivial exercise. Indeed, for some small 
€ we may be interested in the «-rank of a matrix which we define by 


rank( A, e) = min  rank(B). 
lA- Bl Se 


Thus, if A is obtained in a laboratory with each a4; correct to within +.001, 
then it might make sense to look at rank(A, .001). Along the same lines, if 
A is an m-by-n floating point matrix then it is reasonable to regard A as 
numerically rank deficient if rank( A, c) < min(m, n) with e = ull A |l2. 

Numerical rank deficiency and «rank are nicely characterized in terms 
of the SVD because the singular values indicate how near a given matrix is 
to a matrix of lower rank, 


Theorem 2.5.3 Let the SVD of A € R™*" be given by Theorem 2.5.2. If 
k <r = rank( A) and 


k 
A, = S ear : (2.5.10) 
iml 
then 
min ]|A4—B]| = |A- Akla = ory. (2.5.11) 


rank( B)=ak 
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Proof. Since UT A,V = diag(eo,. ..,04,0,..., 0) it follows that rank(4,) = 
k and that UT(A—A,)V = diag(0,...,0,0441,..., 05) and so || A — Ax ||la = 
k+l- 

Now suppose rank(B) = k for some B c R™*”. It follows that we can 
find orthonormal vectors xi,...,z4.. so null(B) = span(zi,...,z4 X). 
A dimension argument shows that 


span(zi,...,zQ-x.] M span(ty,...,vk,1) X {0}. 


Let z be a unit 2-norm vector in this intersection. Since Bz = 0 and 


kti 
Az = M oj(v} z)u; 


im 1 
we have 
k+l 
WA-BUR > I(4- Bj i =A} =Y norz > of, 


im] 
completing the proof of the theorem. O 

Theorem 2.5.3 says that the smallest singular value of A is the 2-norm 
distance of A to the set of all rank-deficient matrices. It also follows that 


the set of full rank matrices in IR""" is both open and dense. 
Finally, if re = rank( A, e), then 


T1 Bev È dre > E È Orget >t È dp p = min{m, n}. 


We have more to say about the numerical rank issue in $5.5 and $12.2. 


2.5.6 Unitary Matrices 


Over the complex Geld the unitary matrices correspond to the orthogonal 
matrices. In particular, Q € C™*" is unitary if QQ = QQ* = In. Unitary 
matrices preserve 2-norm. The SVD of a complex matrix involves unitary 
matrices. If A € €"*", then there exist unitary matrices U € (70*"* and 
V e ("*? such that 


U AV = diag(o1,...,0,) ERT” — pzmin(m,n] 
where 7, > 04 2... 20, >0. 


Problems 


P2.5.1 Show that if 5 is rea] and ST = —5, then J — 5 is nonsingular and the matrix 
U — S)-!( + S) is orthogonal. This is known aa the Cayley transform of S. 
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P2.5.2 Show that a triangular orthogonal matrix is diagonal. 
P2.5.3 Show that if Q = Qi + iQa is unitary with Q1, Q2 € EC *^, then the 2n-by-2n 
real matrix 

z- | Qi -Q2 | 

Qs Qh 

is orthogonal. 
P2.5.4 Establish properties (2.5.3)-(2.5.9). 
P2.5.5 Prove that = 
maz A) = max VA 
ye R™2zeR" l2 lal vl 


Tmin{A) that are functions of w, x, y, and z. 
P2.5.7 Show that any matrix in R™*™ is the limit of a sequence of full rank matrices. 
P2.5.8 Show thet if A € IC?" has rank n, then || A(AT A)7 1 AT ||; = 1. 

1 M 
0 1 


|: derive expressions for ¢?max(A) and 


P2.5.0 What is the nearest rank-one matrix to A = | | in the Frobenius norm? 


P2.5.10 Show that if A € K'**^ then || A||z € «/rank(A) || A ||2, thereby sharpening 
(2.3.7). 


Notes and References for Sec. 2.5 


Forsythe and Moler (1967) offer a good account of the SVD's role in the analysis of the 
Az = b problem. Their proof of the decomposition is more traditional than ours in that 
it makes use of the eigenvalue theory for symmetric matrices, Historical SVD references 
include 


E. Beltrami (1873). "Sulle Funzioni Bilineari," Gionale di Mathematiche 11, 98-106. 

C. Eckart and G. Young (1939). "A Principal Axis Transformation for Noo-Hermitian 
Matrices,” BWL Amer. Math. Soc. 45, 118—21. 

G.W. Stewart (1993). “On the Early History of the Singular Value Decomposition,” 
SIAM Review S5, 551—506. 


One of the most significant developments in scientific computation has been the increased 

use of the SVD in application areas that require the intelligent handling of matrix rank. 

The range of applications is impressive. One of the moat interesting is 

C.B. Moler and D. Morrison (1983). “Singular Value Analysm of Cryptograms,” Amer. 
Math. Monthly 90, 78-37. 

For generalizations of the SVD to infinite dimensional Hilbert space, see 

LC. Gohberg and M.G. Krein (1969). Introduction to the Theory of Linear Non-Self 
Adjoint Operators , Amer. Math. Soc., Providence, RI. 

F. Smithies (1970). Integral Equations, Cambridge University Press, Cambridge. 

Reducing the rank of a matrix as in Theorem 2.5.3 when the perturbing matrix is con- 

trained is di | in 


J.W. Demmel (1987). “The smallest perturbation of a submatrix which lowers the rank 
and constrained total least squares problems, SIAM J. Numer. Anal. 24, 199-206. 


2.6. PROJECTIONS AND THE CS DECOMPOSITION 75 


G.H. Golub, A. Hoffman, and G.W. Stewart (1988). *A Generalization of the Eckart- 
Young-Mirsky Approximation Theorem." Lin, Alg. and fis Applic. 88/89, 317-328. 

G.A. Watson (1988). “The Smallest Perturbation of a Submatrix which Lowers the Rank 
of the Matrix,” IMA J. Numer. Anal. 8, 295-304. 


2.6 Projections and the CS Decomposition 


If the object of a computation is to compute a matrix or a vector, then 
norms are useful for assessing the accuracy of the answer or for measuring 
progress during an iteration. If the object of a computation is to compute 
a subspace, then to make similar comments we need to be able to quantify 
the distance between two subspaces. Orthogonal projections are critical in 
this regard. After the elementary concepts are established we discuss the 
CS decomposition. This is an SVD-like decomposition that is handy when 
having to compare a pair of subspaces. We begin with the notion of an 
orthogonal projection. 


2.6.1 Orthogonal Projections 


Let S C IR" be a subspace. P e IR**" is the orthogonal projection onto 
S if ran(P) = S, P! = P, and PT = P. From this definition it is easy to 
show that if z € IR", then Pz € S and (J - P)z € S+. 

If P, and P4 are each orthogonal projections, then for any z € IR^ we 
have 

| (Pi - Pa)2 h} = (Piz)TU - Pye + (Paz) (1 - P3)». 

If ran(P,) = ran(P5) = S, then the right-hand side of this expression is 
zero showing that the orthogonal projection for a subspace is unique. If the 
columns of V = [$;,..., vx | are an orthonormal basis for a subspace S, then 
it is easy to show that P = VV" is the unique orthogonal projection onto 
S. Note that if v € R^, then P = vvT /v7v is the orthogonal projection 
onto $ = span{v}. 


2.6.2 SVD-Related Projections 


There are several important orthogonal projections associated with the sin- 
gular value decomposition. Suppose A = UEV™ c R™*" is the SVD of A 
and that r = rank(A). If we have the U and V partitionings 


y= [ Ur U, ] Yo [ V. V. | 
r m-r r n-—r 
then 
V.VT = projection on to null(A)* = ran(AT) 
V.VT = projection on to null( A) 
U,UT = projection on to ran(A) 


U,UT = projection on to ran( A). = null( A7) 
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2.6.3 Distance Between Subspaces 


The one-to-one correspondence between subspaces and orthogonal projec- 
tions enables us to devise a notion of distance between subspaces. Suppose 
Sı and 54 are subspaces of R” and that dim{S,) = dim(52). We define the 
distance between these two spaces by 


dist($1,5;) = || Pi — Po Jl (2.6.1) 


where F; is the orthogonal projection onto S, . The distance between a 
pair of subspaces can be characterized in terms of the blocks of a certain 
orthogonal matrix. 


Theorem 2.6.1 Suppose 


W [Wi W] Z-l[Z Za | 
k n-k k n-k 


are n-by-n orthogonal matrices. If Sı = ran(Wi) and S5 = ran(Z,), then 
dist(Sy,52) = || WF Zela = | Z7 Wa le. 
Proof. 
dist( 5, S2) 


I} 


| WWT — 2,27 |a = | WT(WAWET — 22Z7)Z |, 


0 Wi Zi 
-WTZ, 0 


Note that the matrices W7 Z, and WI Z are submatrices of the orthogonal 
matrix 


2 


[Qn Qa] _ [ WTZ wee, |= T 
T= Qa Qn Jib Wi Za c e 


Our goal is to show that || Qa: ||, = || Qiz lip Since Q is orthogonal it 
follows from 
Q T a Qur 
0 Quiz 


=| Quri + |} Qazi? 
far all unit 2-norm r € R*. Thus, 


that 


lQaià = ma |j|Qazi;-1- min |Quri? 


iz dla: | = lai 


1 -omin(Qu)". 
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Analogously, by working with QT (which is also orthogonal) it is possible 
to show that 
ll Qf ila =1- V min( Q1; ^. 
and therefore 
Il Qi ll = 1 — emis (Qa). 
Thus, || Qa ll; = || Qiz |l. G 
Note that if S, and S4 are subspaces in IR" with the same dimension, then 


0 « dist( 5, , 54) < 


The distance is zero if S; = Sz and one if S, S4. # {0}. 

A more refined analysis of the blocks of the Q matrix above sheds more 
light on the difference between a pair of subspaces. This requires a special 
SVD-like decomposition for orthogonal matrices. 


2.6.4 The CS Decomposition 


The blocks of an orthogonal matrix partitioned into 2-by-2 form have highly 
related SVDs. This is the gist of the CS decomposition. We prove a very 
useful special case first. 


Theorem 2.6.2 (The CS Decomposition (Thin Version)) Consider the 


matrix 


Q- | E Q, ERMIR, Qy c R™*" 


where mi > n and mg 2 n. If the columns of Q are orthonormal, then there 
exist orthogonal matrices U, & R™*™ | Uy e "2*2. and Vi c R?*" such 


[$ a] [a] [5] 


C 
S 


where 


diag(cos(9,),..-,cos(@n)), 
diag(sin({@;), nee ,sin(@,)), 


and 
0x81 <8 5... 04, < 


Proof. Since || Qu |l; < {| Q Ila = 1, the singular values of Q1; are all in 
the interval [0, 1]. Let 


UTQiA =C = diag(cz,...,¢n) = E : | 
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be the SVD of Qı where we assume 
]-20-:-0»50442:52606,20. 


To complete the proof of the theorem we must construct the orthogonal 
matrix Uz. If 
QM 2 [W3. m] 
i n-t ' 


(t 2T(&I«-[3 8] 


Since the columns of this matrix have unit 2-norm, Wi = 0. The columns 
of W are nonzero and mutually orthogonal because 


WIW;-I,-—E'Xszsdig(1-d,,..1-cd) 
is nonsingular. If s; = /1 — c1 for k = 1:n, then the columns of 
Z = W3 diag(1/st41,. rey 1/8n)} 


are orthonormal. By Theorem 2.5.1 there exists an orthogonal matrix 
Uz € IR™*™2 with Uo(:,t+ l:n) = Z. It is easy to verify that 


UT QV = diag( sy, re $5) = 8. 


Since ci+s? = l for k = 1:n, it follows that these quantities are the required 
cosines and sines. 0 


then 


Using the same sort of techniques it is possible to prove the following more 
general version of the decomposition: 


Theorem 2.6.3 (CS Decomposition (General Version)) If 


Qiu | Que | 
Qon 


is a 2-by-2 (arbitrary) partitioning of an n-dy-n orthogonal matriz, then 
there exist orthogonal 


"em -9 v= bee 


such that 


UTQV = 
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where C = diag(c,...,c5) and S = diag(s,...,s5) are square diagonal 
matrices with 0 < c3, « 1. 


Proof. See Paige and Saunders (1981) for details. We have suppressed the 
dimensions of the zero submatrices, some of which may be empty. O 


The essential message of the decomposition is that the SVDs of the Qi; are 
highly related. 


Example 2.0.1 The matrix 


—0.7576 0.3697 0.3838 0.2126 — —0.3112 

—0.4077  —0.1552 | —0.1129 0.2676 0.8517 

Q= | —0.0488 0.7240 | —0.6730 -0.1301 0.0602 
; : —0.9235 . 

0.4530 0.5612 0.5806 0.1162 0.3595 


UTQV = 


lhe angles associated with the cosines and sines turn out to be very im- 
portant in a number of applications. See 512.4. 


Problems 


P2.6.1 Show that if P is an orthogonal projection, then Q = I — 2P is orthogonal. 
P2.6.2 What are the singular values of An orthogonal projection? 


P2.6.3 Suppose S, = span(r) and S2 = span(y), where r and y are unit 2-norm 
vectors in R2. Working only with the definition of dist(.,-), show that dist(51, 93) = 
of 1— (zT y)! verifying that the distance between S, and 52 equala the sins of the angie 
between z and y. 


Notes and References for Sec. 2.8 
The following papers discuss various aspects of the CS decomposition: 


C. Davis and W. Kahan (1970). “The Rotation of Eigenvectors by a Perturbation III," 
SIAM J. Num. Anal. 7, 1-46. 

G.W. Stewart (1977). "On the Perturbation of Pseudo-Inverses, Projections and Linear 
Least Squares Problema," SIAM Review 19, 634—662. 

C.C. Paige and M. Saunders (1981). “Toward a Generalized Singular Value Decomposi- 
tion,” SIAM J. Num. Angi. 18, 398—405. 

C.C. Paige and M. Wei (1994). “History and Generality of the CS Decomposition,” Lin. 
Alg. and Its Applic. 208/209, 303-326, 
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See §8.7 for some computational details. 
For a deeper geometrical understanding of the CS decomposition and the notion of 
distance between subspaces, see 


T.A. Arias, A. Edelmaz, and S. Smith (1996). “Conjugate Gradient and Newton's 
Method on the Grassman and Stiefel Manifolds,” to appear in SIAM J. Matriz Anal. 
Appl. 


2.7 The Sensitivity of Square Systems 


We now use some of the tools developed in previous sections to analyze the 
linear system problem Ax = b where A € R'*" is nonsingular and € IR". 
Our aim is to examine how perturbations in A and b affect the solution z. 
À much more detailed treatment may be found in Higham (1996). 


2.7.1 An SVD Analysis 
It E 
A= $ omv = UZVT 


i=] 


is the SVD of A, then 


nu 
r = Ab = (UEVTy- b = "^v (2.7.1) 


ízl id 
This expansion shows that small changes in A or 5 can induce relatively 
large changes in z if dn is small. 
It should come as no surprise that the magnitude of c4, should have 
& bearing on the senaitivity of the Ar — b problem when we recall from 
Theorem 2.5.3 that c, ia the distance from A to the set of singular matrices. 
As the matrix of coefficients approaches this set, it is intuitively clear that 
the solution x should be increasingly sensitive to perturbations. 


2.7.2 Condition 


A precise measure of linear system sensitivity can be obtained by consider- 
ing the parameterized system 

(A+eP)z(eh=b+ef z(0)=z 
where F c E^" and f € R”. 1f A is nonsingular, then it is clear that z(e) 
is differentiable in a neighborhood of zero. Moreover, z(0) = A^!(f— Fz) 
and thus, the Taylor series expansion for x(e) has the form 


z(e) = zr  ei(0)-- Ofe). 
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Using any vector norm and consistent matrix norm we obtain 


[z(9-24 za an {El E 
py 55595 ud +H FES + OC ). (272) 


For square matrices A define the condition number r(A) by 
S(A) = |} Al] AT (2.7.3) 


with the convention that «(A) = oo for singular A. Using the inequality 
oi € Ad] il it follows from (2.7.2) that 


LEIEL < w(ANoa + ou) + Ol) (2.7.4) 
Mem IFI ifi 
= ldap Ue dy 


represent the relative errors in A and b, respectively. Thus, the relative 
error in z can be «( A) times the relative error in A and 5. In this sense, the 
condition number x( A) quantifies the sensitivity of the Az = b problem. 

Note that «(-) depends on the underlying norm and subscripts are used 
accordingly, e.g., 


e1(À) 


K2(A) = f A ilal] A7! liz = "EY 


(2.7.5) 
Thus, the 2-norm condition of a matrix A measures the elongation of the 
hyperellipsoid (Az : | z || = 1}. 

We mention two other characterizations of the condition number. For 
p-norm condition numbers, we have 


1 a I, 


—— = ————— 2.1.6 
mA) ^ araAdnguar Al Pu 
This result may be found in Kahan (1966) and shows that «,(A) measures 
the relative p-norm distance from A to the set of singular matrices. 

For any norm, we also have 


«(A)- lim sup | (A+ AAt- A|| 1 o OUO) 
€—0 =O ARSeq AY € T A-I | 


This imposing result merely says that the condition number is a normalized 
Frechet derivative of the map A — A-^!. Further details may be found in 
Rice (1966b). Recall that we were initially led to «(A)} through differenti- 


ation. 
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If x(A) is large, then A is said to be an ill-conditioned matrix. Note that 
this is a norm-dependent property*. However, any two condition numbers 
Ka(-) and «g{-} on R°™” are equivalent in that constants c; and c; can be 
found for which 


Cika{A) € &g(À) € Kald) AER". 


For example, on IR"*" we have 


= na(A) < (A) € ned) 


= Roo (A) € KA) € n&w(A) (2.7.8) 
80 € feo(A) € n?m(A). 


Thus, if a matrix is ill-conditioned in the a-norm, it is ill-conditioned in 
the -norm modulo the constants c; and cz above. 

For any of the p-norms, we have «,{A) 2 1. Matrices with small con- 
dition numbers are said to be weil-conditioned . In the 2-norm, orthogonal 
matrices are perfectly conditioned in that «2(Q) = 1 if Q is orthogonal. 


2.7.3 Determinants and Nearness to Singularity 


It is natural to consider how well determinant size measures ill-conditioning. 
If det( A) = 0 is equivalent to singularity, is det(A) = 0 equivalent to near 
singularity? Unfortunately, there is little correlation between det(A) and 
the condition of Ar = 6. For example, the matrix B, defined by 


l -l =- -1 
0 d dE 

Bas ja . . . | em (2.7.9) 
0 0 1 


has determinant 1, but £o (Bn) = n2^^!. On the other hand, a very weil 
conditioned matrix can have s very small determinant. For example, 


D, = diag(10-',...,10-'} e R"*” 
satisfies x,{D,) = 1 although det( Dn) = 10-*. 


2.7.4 A Rigorous Norm Bound 


Recall that the derivation of (2.7.4) was valuable because it highlighted the 
connection between «(A) and the rate of change of z(¢) at € = 0. However, 


?]t also depends upon the definition of "large." The matter is pursued in $3.5 
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it is a little unsatisfying because it is contingent on e being "small enough” 
and because it sheds no light on the size of the O(e*) term. In this and the 
next subsection we develop some additional Az = b perturbation theorems 
that are completely rigorous. 

We first establish a useful lemma that indicates in terms of &(.4) when 
we can expect a perturbed system to be nonsingular. 


Lemma 2.7.1 Suppose 
At = b AER”, 04b¢€ Rh" 


(A + AA)y 


b+Ab AAER™™", Abc R" 


with | AA || € el All and |} Ab || <j] bl]. fex(A) =r <1, then A+ AA 
is nonsingular and 


hy Il < ltr 
|| x || bar 


Proof. Since | A! AA||] € «l| Ai] || All = 7 < 1 it follows from 
Theorem 2.3.4 that (A+ AA) is nonsingular. Using Lemma 2.3.3 and the 
equality (I + A" AAyy = r + A^! Ab we find 

ivi s |l "Aid UN (IE zH-epA" : || 53) 


: hy 
Halte 7 ep) = Lm (litrii) 
Since | bj] = || Az || < || A M Iz M it follows that 
iyl € = Mz k+ riz) o 


We are now set to establish a rigorous Ax = b perturbation bound. 
Theorem 2.7.2 If the conditions of Lemma 2.7.1 hold, then 


ly-zi 
< 2.1.10 
PI^ des To) 
Proof. Since 
y-r = A7'Ab - A^"! AAy (2.7.11) 
we have y- zll < el A7! Ib + el A7} NIAI yl and so 
tesi 5l lvl 
—————— < end + €k(À 
zi (Arad ^ "Ay 
l+r 2€ 
< = .u 
< en (ie E) = ret 
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Example 2.7.1 The Az = b problem 


1 0 tı |. 1 

0 109 za | | 1078 
has solution z = (1, 1)7 and condition «c, (A) = 109. If Ab = (1075, 0)T, AA — 0, 
and (A+ AA)y = b + Ab, then y = (1+ 1079, 1)7 and the inequality (2.7.10) says 
BHz-vla . || Sb feo 

Il = Ilco il è los 
Thus, the upper bound in (2.7.10) can be a gross overestimate of the error induced by the 
perturbation. On the other band, if Ab = (0, 10-*)T, AA = 0, and (A--AA)y = b+ Ab, 
then this inequality says 
e € 2x 1075105 , 

Thus, there are perturbations for which the bound in (2.7.10) is essentially attained. 


107$ = Keo(A) = 1079109 = 1. 


2.7.5 Some Rigorous Componentwise Bounds 


We conclude this section by showing that a more refined perturbation the- 
ory is possible if componentwise perturbation bounds are in effect and if 
we make use of the absolute value notation. 


Theorem 2.7.3 Suppose 
Az = b AER” 0zbcm" 


(A+ AAly = b+ Ad AAC R*^, Ade IRA 


and that [AA] < elA| and |Ab| < e|b]. If Éw A) =r «1, then (A+ AA) 
is nonsingular and 


li 


ly—zle . 2% |i 
isa S pfe. 
Proof. Since | 4A ||;; < ell A llo and || Ab ||;; < «l| è ll. the conditions of 
Lemma 2.7.1 are satisfied in the infinity norm. This implies that A+ AA 
is nonsingular and 

Hy llo < Arf 

Iz Ilo ~ l-r 
Now using (2.7.11) we find 


ly— 2] € |AT*| {Ad} + [Am AAT y 


iA 


el AT" Jb] qA"' HAllyi S el AT IA] CIE + lyh) - 


If we take norms, then 


= Il+r 
Iv- zilo S el ATH LA] iko (1 z lho + EEI 2 Hoo) « 
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The theorem follows upon division by || x |oo. O 


We refer to the quantity || |A^!| |A| |oo as the Skeel condition number. It 
has been effectively used in the analysis of several important linear system 
computations. See 83.5. 

Lastly, we report on the results of Oettli and Prager (1964) that indicate 
when an approximate solution = c R” to the n-by-n system Az = b satis- 
fies a perturbed system with prescribed structure. In particular, suppose 
Ec R" and f € R” are given and have nonnegative entries. We seek 
AA € R**^, Ab c R”, and w > 0 such that 


(A + AA = b Ab |AA <wE, Ab wf. (2.7.12) 


Note that by properly choosing E and f the perturbed system can take on 
certain qualities. For example, if E = |A} and f = |b| and w is small, then 
$ satisfies a nearby system in the componentwise sense. Oettli and Prager 
(1964) show that for a given A, b, 2, E, and f the smallest w possible in 
(2.7.12) is given by 
Wmin = max [Az -bli : 
vi isisa (Elz) + f) 


If At = b then wmin = 0. On the other hand, if wmin = oo, then = does 
not satisfy any system of the prescribed perturbation structure. 


Problems 


P2.7.1 Show that if | 7 |] > 1, then x(A) > 1. 


P2.T.2. Show that for a given norm, «(AB) € &(A)&(B) and that «(aA) = «(A) for all 
nonzero a. 


P2.7.3 Relate the 2-norm condition of X € R™*" (m > n) to the 2-norm condition of 
the matrices P m 
E m 
"Ur 2j 


c- [X ]. 


Notes and Raferences for Sec. 2.7 
The condition concept is thoroughly investigated in 


J. Rice (1966). “A Theory of Condition,” SIAM J. Num. Anal. 3, 287-310. 
W. Kahan (1966). “Numerical Linear Algebra,” Canadian Math Bull 9, T57-801. 


References for componentwise perturbation theory include 
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Chapter 3 


General Linear Systems 


§3.1 Triangular Systems 

§3.2 The LU Factorization 

93.3 Roundoff Analysis of Gaussian Elimination 
$3.4 Pivoting 

83.5 Improving and Estimating Accuracy 


The problem of solving a linear system Az = 5 is central in scientific 

computation. In this chapter we focus on the method of Gaussian elimi- 
nation, the algorithm of choice when A is square, dense, and unstructured. 
When A does not fall into this category, then the algorithms of Chapters 
4, 5, and 10 are of interest. Some parallel Az = b solvers are discussed in 
Chapter 6. 
. " We motivate the method of Gaussian elimination in §3.1 by discussing 
the ease with which triangular systems can be solved. The conversion of 
a general system to triangular form via Gauss transformations is then pre- 
sented in 53.2 where the "language" of matrix factorizations is introduced. 
Unfortunately, the derived method behaves very poorly on a nontrivial class 
of problems. Our error analysis in $3.3 pinpoints the difficulty and moti- 
vates $3.4, where the concept of pivoting is introduced. In the final section 
we comment upon the important practical issues associated with scaling, 
iterative improvement, and condition estimation. 


Before You Begin 


Chapter 1, §§2.1-2.5, and 82.7 are assumed. Complementary references 
include Forsythe and Moler (1967), Stewart (1973), Hager (1988), Watkins 
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(1991), Ciarlet (1992), Datta (1995), Higham (1996), Trefethen and Bau 
(1996), and Demmel (1996). Some MATLABfunctions important to this 
chapter are lu, cond, rcond, and the “backslash” operator “\”. LAPACK 
connections include 


Condition estimate 
Solve AX = B, AT X = B with error bounds 
Solve AX = B, ATX aB 

A^! 


Condition estimate via PA = LU 

Improve AX = B, AT X = B, AP X = B solutions with error bounds 
Solve AX = B, AT X = B, A" X = B with condition estimate 

PA = LU 

Solve AX = B, AT X = B, A" X = B via PA = LU 

A^! 

Equilibration 


3.1 Triangular Systems 
Traditional factorization methods for linear systems involve the conversion 
of the given square system to a triangular system that has the same solution. 
This section is about the solution of triangular systems. 
3.1.1 Forward Substitution 
Consider the following 2-by-2 lower triangular system: 
fn 0 an | bi 
fn fn T2 hl 
If 1,423 Æ 0, then the unknowns can be determined sequentially: 
a = blu 
za = (bg ~ £9171)/én. 


This is the 2-by-2 version of an algorithm known as forward substitution. 
The general procedure is obtained by solving the ith equation in Lz = b 


for Ti: 
i-i 
rz; = - Enn / bi 
j=l 
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If this is evaluated for i = 1:n, then a complete specification of z is obtained. 
Note that at the ith stage the dot product of L(i,1:i — 1) and z(1:i — 1) is 
required. Since b; only is involved in the formula for z;, the former may be 
overwritten by the latter: 


Algorithm 3.1.1 (Forward Substitution: Row Version) If L c R?*" 
is lower triangular and b € IR", then this algorithm overwrites b with the 
solution to Lr = b. L is assumed to be nonsingular. 


b(1) = b(1)/L(1,1) 
for i = 2:n 
b(t) = (b(1i) — L(i, 1:3 — Lola — 1))/L(i, 1) 
end 
This algorithm requires n^ flops. Note that L is accessed by row. The 
computed solution z satisfies: 


(L-F)zeb |F| € nu|L| + Om’) (3.1.1) 


For a proof, see Higham (1996). It says that the computed solution exactly 
satisfies a slightly perturbed system. Moreover, each entry in the perturbing 
matrix F is small relative to the corresponding element of L. 


3.1.2 Back Substitution 


The analogous algorithm for upper triangular systems Uz = b is called 
back-substitulion. The recipe for x, is prescribed by 


n 
a = |b; - » Wig; | 


and once again b; can be overwritten by Ti. 


Algorithm 3.1.2 (Back Substitution: Row Version) if U c R" 
is upper triangular and b € R^, then the following algorithm overwrites b 
with the solution to Uz = b. U is assumed to be nonsingular. 

b(n) = b(n)/U(n,n) 

for i =n — 1: —1:1 

b(t) = (b(i) — U (i, i + 1:n)&(i + 1:n)) /U (i i) 

end 
This algorithm requires n^ flops and accesses U by row. The computed 
solution £ obtained by the algorithm can be shown to satisfy 


(U+F)z =b |El € nu[U| + O(u?). (3.1.2) 
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3.1.3 Column Oriented Versions 


Column oriented versions of the above procedures can be obtained by re- 
versing loop orders. To understand what this means from the algebraic 
point of view, consider forward substitution. Once x; is resolved, it can 
be removed from equations 2 through n and we proceed with the reduced 
system L(2:n, 2:n)z(2:n) = b(2:n) — z(1) L(2:n, 1). We then compute rz and 
remove it from equations 3 through n, etc. Thus, if this approach is applied 


to 
2 0 0 zi 6 
1 5 0 I2 = 2 
7 9 8 Z3 5 


we find z; = 3 and then deal with the 2-by-2 system 


[Sella] 7 [s] (| = {ae} 


Here is the complete procedure with overwriting. 


Algorithm 3.1.3 (Forward Substitution: Column Version) If L e IR"*" 
is lower triangular and b € R”, then this algorithm overwrites b with the 
solution to Lr = b. L is assumed to be nonsingular. 


for j—imn-1 

b(j) = 00)/ LO, 7) 

b(j + lin) = b(j + En) — (ALC + 1m, 7) 
end 


b(n) = b(n)/ L(n,n) 


It is also possible to obtain a column-oriented saxpy procedure for back- 
substitution. 


Algorithm 3.1.4 (Back Substitution: Column Version) IfU c R°*” 
is upper triangular and b € IR", then this algorithm overwrites b with the 
solution to Uz = b. U is assumed to be nonsingular. 


for j 2n:- 1:2 

b(3) = &(/UG. 3) 

b(1:j — 1) = 6(1:7 — 1) — b(3)U (1:7 — 1,7) 
end 
&(1) = 5(1)/U(1, 1) 


Note that the dominant operation in both Algorithms 3.1.3 and 3.1.4 is 
the saxpy operation. The roundoff behavior of these saxpy implementations 
is essentially the same as for the dot product versions. 

The accuracy of a computed solution to a triangular system is often 
surprisingly good. See Higham (1996). 
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3.1.4 Multiple Right Hand Sides 


Consider the problem of computing a solution X € R'** to LX = B where 
LeEPC*" is lower triangular and B € EC**, This is the multiple right 
hand side forward substitution problem. We show that such & problem 
can be solved by a block algorithm that is rich in matrix multiplication 
assuming that q and n are large enough. This turns out to be important in 
subeequent sections where various block factorization schemes are discussed. 
We mention that although we are considering here just the lower trianguiar 
problem, everything we say applies to the upper triangular case as well. 

To develop a block forward substitution algorithm we partition the equa- 
tion LX = H as follows: 


Lii 0 Bi 
La Lan ees Bz 

. = ; ; (3.1.3) 
Lui Lua By 


Assume that the diagonal blocks are square. E the development of 
Algorithm 3.1.3, we solve the system £,,X, = B, for X, and then remove 
X; from block equations 2 through N: 


Lo 0 ttt 0 Xa Bo m Bx, 
La) La c 0 X3 By — La Xi 
Lwa Lwa --- LNN XN By — Emi Xi 


Continuing in this way we obtain the following block saxpy forward elimi- 
nation scheme: 


for j =1:N 
Solve L;4X; = Bj 
fori-jd-LbN (3.1.4) 
B = Bi = Li X; 
end 
end 


Notice that the :-loop oversees a single block saxpy update of the form 


Bj Biyi Lii 
|| o: debo: [%. 
By By Lua 
For this to be handled as a matrix multiplication in a given architec- 
ture it is clear that the blocking in (3.1.3) must give sufficiently "big" 
X;. Let us assume that this is the case if each X; has at least r rows. 


This can be accomplished if N = ceil(n/r) and X,,..., Xy; € R”? and 
Xu € Ri"-("-0nx« 
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3.1.5 The Level-3 Fraction 


It is handy to adopt a measure that quantifies the amount of matrix multi- 
plication in a given algorithm. To this end we define the level-3 fraction of 
an algorithm to be the fraction of flops that occur in the context of matrix 
multiplication. We call such flops level-3 flops. 

Let us determine the level-3 fraction for (3.1.4) with the simplifying 
assumption that n — rN. (The same conclusions hold with the unequal 
blocking described above.) Because there are N applications of r-by-r 
forward elimination (the level-2 portion of the computation) and n? flops 
overall, the level-3 fraction is approximately given by 


Thus, for large N almost all flops are level-3 flops and it makes sense to 
choose N as large as possible subject to the constraint that the underlying 
architecture can achieve a high level of performance when processing block 
saxpy's of width at least r = n/N. 


3.1.8 Non-square Triangular System Solving 


The problem of solving nonsquare, m-by-n triangular systems deserves some 
mention. Consider first the lower triangular case when m > n, i.e., 


Li B E bi Li, € Ios" b Ee R” 
La d be ia € RUT TAR bg € RT" 


Assume that L1; is lower triangular, and nonsingular. If we apply forward 
elimination to Lir = b then z solves the system provided LaL) bi) = 
b. Otherwise, there is no solution to the overall system. In such a case 
least squares minimization may be appropriate. See Chapter 5. 

Now consider the lower triangular system Lz = b when the number 
of columns n exceeds the number cf rows m. In this case apply forward 
substitution to the square system £{1:m,1:m)z(1:m, 1:7) = b and prescribe 
an arbitrary value for (m+ 1:n). See 85.7 for additional comments on 
systerns that have more unknowns than equations. 

The handling of nonsquare upper triangular systems is similar. Details 
are left to the reader. 


3.1.7 Unit Triangular Systems 


A unit triangular matrix is a triangular matrix with ones on the diagonal. 
Many of the triangular matrix computations that follow have this added 
bit of structure. It clearly poses no difficulty in the above procedures. 
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3.1.8 The Algebra of Triangular Matrices 


For future reference we list a few properties about products and inverses of 
triangular and unit triangular matrices. 
e The inverse of an upper (lower) triangular matrix is upper {lower} 
triangular. 
e The product of two upper (lower) triangular matrices is upper (iower) 
triangular. 


e The inverse of a unit upper (lower) triangular matrix is unit upper 
(lower) triangular. 


e The product of two unit upper (lower) triangular matrices is unit 
upper (lower) triangular. 


Problems 


P3.1.1 Give an algorithm for computing a nonzero z € R™ such that Uz = 0 where 
U c R°*™ js upper triangular with ugs = 0 and ur’ o ua 1,4- 1 X Ô. 

P3.1.2 Discuss how the determinant of a square triangular matrix could be computed 
with minimum risk of overfiow and underflow. 

P3.1.3 Rewrite Algorithm 3.1.4 given that U is stored by column in a length n(n 4- 1)/2 
array u.vec. 

P3.1.4 Write a detailed version of (3.1.4). Do not assume that N divides n. 

P3.1.5 Prove all the facts about triangular matrices that are listed in $3.1.8. 

P3.1.8 Suppose $,T c R?*" are upper triangular and that (ST — AJ)z = b is a non- 
singular system, Give an O(n?) algorithm for computing z. Note that the explicit 
formation of ST — Af requires O(n?^) flops. Hint. Suppose 


s[i E] [s Z] - [8] 
where Sy = S(k-LI:n,k— 1:1), Ty = T(k—-Im,k- in), b, = b(k— iin} aod «7, AER. 
Show that if we have a vector x, such that 

(ScTs — AT)z, = b 
and we = Teze is available, then 


Y B— avt z, —uT ur, 
pep] devel 


solves (S474 — Af)r, = b4. Observe that ry and wy = T r4 each require O(n — k} 
flopa. 

P3.1.7 Soppose the matricea R,,..., Rp c RP*" are all upper triangular, Give an 
O(pn?) algorithm for solving the system (R; --- Rp — AI)z = b aamuming that the matrix 
of coefficients is nonsingular. Hint. Generalize the solution to the previous problem. 
Notes and Raferences for Sec. 3.1 

The accuracy of triangular system solvers is analyzed in 


N.J. Higham (1989). “The Accuracy of Solutions to Triangular Systema," SIAM J. Num. 
Anal 26, 1252—1265. 
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3.2 The LU Factorization 


As we have just seen, triangular systems are “easy” to solve. The idea 
behind Gaussian elimination is to convert a given system Ax = 6 to an 
equivalent triangular system. The conversion is achieved by taking appro- 
priate linear combinations of the equations. For example, in the system 
Jz 5r, = 9 
6ri-d- 7x4 = 4 
if we multiply the first equation by 2 and subtract it from the second we 
obtain 


3z, 5T: = 9 

—drg = -14 
This is n = 2 Gaussian elimination. Our objective in this section is to give 
a complete specification of this central procedure and to describe what it 
does in the language of matrix factorizations. This means showing that 


the algorithm computes a unit lower triangular matrix L and an upper 
triangular matrix U so that A = LU, e.g., 


ec] = do i]lo |: 


The solution to the original Ar = b problem is then found by a two step 
triangular solve process: 


Ly-b, Ur=y => Az = LUz = Ly =b. 


The LU factorization is a “high-level” algebraic description of Gaussian 
elimination. Expressing the outcome of a matrix algorithm in the “lan 
guage” of matrix factorizations is a worthwhile activity. It facilitates gen- 
eralization and highlights connections between algorithms that may appear 
very different at the scalar level. 


3.2.1 Gauss Transformations 


To obtain a factorization description of Gaussian elimination we need a 
matrix description of the zeroing process. At the n = 2 level if t, Æ 0 and 


T =2/2,, then 
[allal] 


More generally, suppose z € R” with rz, x 0. If 


Ti 


TT =(0,...,0 Thus TO) Ti = — i=k+ln 
awd Ik 
k 


3.2. THE LU FACTORIZATION 95 


and we define 
My, =I- ref, (3.2.1) 
then 
1 0 D Ü Ti Tı 
_ | 9 1 0 0 Ik _ | Tk 
Mis = | o —Te+i 1 0 Zka | | 0 
D. Aw o UO omm Ia 0 


In general, a matrix of the form My = I — rel € "^ is a Gauss trans- 
formation if the first k components of 7 € R” are zero. Such a matrix is 
unit lower triangular. The components of r(k + L:n} are called multipliers. 
The vector r is called the Gauss vector. 


3.2.2 Applying Gauss Transformations 


Multiplication by a Gauss transformation is particularly simple. If C € R°** 
and M, = I — ref is a Gauss transform, then 


MC = (I-reL)C = C-r(elC) = C- rC(k,:). 


is an outer product update. Since r(1:k) = 0 only C(k + 1:n,:) is affected 
and the update C = M,C can be computed row-by-row as follows: 


for i= k+ l:n 
C(1,:) = CG, :) — riC(k, :) 
end 


This computation requires 2(n — 1)r flops. 


Example 3.2.1 


14 7 0 1 4 7 
C=|2 5 8],r= 1] > Ud—-reT)C =| I 2 1 f. 
3 6 10 -1 4 10 17 


3.2.3 Roundoff Properties of Gauss Transforms 


if 7 is the computed version of an exact Gauss vector r, then it is easy to 
verify that 


F=rt+e le] < uir]. 
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If ? is used in a Gauss transform update and fl((I — 7e7}C) denotes the 
computed result, then 


A(U-$e1)C) = (I -ref)C+ E, 
where 
IE] < 3u(|C} + IrlIC(e, :)]) + O(u’). 


Clearly, if r has large components, then the errors in the update may be 
large in comparison to |C|. For this reason, care must be exercised when 
Gauss transformations are employed, a matter that is pursued in 53.4. 


3.2.4 Upper Triangularizing 


Assume that A e IRP*^, Gauss transformations Mj,..., M4. , can usually 
be found such that Mn-1 -MaM À = U is upper triangular. To see this 
we first look at the n — 3 case. Suppose 


14 7 
A=/2 5 8|. 
3 6 10 


If 
10 0 
Mj = -2 1 0], 
—3 0 I 
then 
l 4 7 
M A = 0 -3 -6 
0 -6 -11 
Likewise, 
1 00 1 4 T 
Mz = 0 1 0 => M( MLA) = 0 -3 -6 . 
0 -2 1 0 D l 


Extrapolating from this example observe that during the kth step 


e We are confronted with a matrix A{*-") = M,_,---M,A that is 
upper triangular in columns 1 to k — 1. 

e The multiplies in M, are based on A‘*-1)(k + 1:n, k). In particular, 
we need afk- ! £ 0 to proceed. 


Noting that complete upper triangularization is achieved after n — 1 steps 
we therefore obtain 
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k=1 

while (A(k, k) £0) & (k < n — 1) 
T(k + lin) = A(k + l:n, kK)/A(k, k) (3.2.2) 
A(k + 1:n,:) = A(k + lin, :) — T(k + In) A(k, :) 
k=k+1 

end 


The entry Á(k, k) must be checked to avoid a zero divide. These quantities 
are referred to as the pivets and their relative magnitude turns out to be 
critically important. 


3.2.5 The LU Factorization 


In matrix language, if (3.2.2} terminates with k = n, then it computes 
Gauss transforms Mj,..., M4, such that M,_,---44,A = U is upper 
triangular. It is easy to check that if My = I — ref, then its inverse is 
prescribed by Mz! = I -- r(EeT and so 


A- LU (3.2.3) 


where 
L= Mj eM, (3.2.4) 


It is clear that L is a unit lower triangular matrix because each M, l is unit 
lower triangular. The factorization (3.2.3) is called the LU factorization of 
A. 

As suggested by the need to check for zero pivots in (3.2.2), the LU 
factorization need not exist. For example, it is impossible to find l;; and 


tj SO 
12 3 1 0 0 ui W2 Wg 
2 4 7 = fs) 1 0 0 um 33 . 
3 5 3 fs, ls: 1 0 0 u3 


To see this equate entries and observe that we must have uj; = 1, uj? = 2, 
fo, = 2, un = 0, and £4, = 3. But when we then look at the (3,2) entry 
we obtain the contradictory equation 5 = faui: + fagun = 6. 

As we now show, a zero pivot in (3.2.2) can be identified with a singular 
leading principal submatrix. 


Theorem 3.2.1 A € R"*" has an LU factorization if det( A(1:k, 1:k)) £ 0 
fork = i;!n — 1. If the LU factorization exists and A is nonsingular, then 
the LU factorization is unique and det(A) = uj; +- tag. 


Proof. Suppose k—1 steps in (3.2.2) have been executed. At the beginning 
of step k the matrix A has been overwritten by M,-,---MiA = AG-D. 


Note that ak- l) is the kth pivot. Since the Gauss transformations are 
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unit lower triangular it follows by looking at the leading k-by-& portion of 
this equation that det(A(1:k, 1:k)) = aV... ath, Thus, if A(1:k, 1:4) 
is nonsingular then the kth pivot is nonzero. 

As for uniqueness, if A = LíU, and A = [4U are two LU factorizations 
of a nonsingular A, then L!L; = UU, . Since L;!L, is unit lower 
triangular and U4U,! is upper triangular, it follows that both of these 
matrices must equal the identity. Hence, L1 = L4 and U, = U3. 

Finally, if A = LU then det(A) =  det(LU) = det(L)det(U) = 
det(U) = ui --: tan. O 


3.2.6 Some Practical Details 


From the practical point of view there are several improvements that can 
be made to (3.2.2). First, because zeros have already been introduced in 
columns 1 through & ~ 1, the Gauss transform update need only be applied 
to columns k through n. Of course, we need not even apply the kth Gauss 
transform to Á(:,k) since we know the result. So the efficient thing to do 
is simply to update A(k + l:n, k + 1:n). Another worthwhile observation is 
that the multipliers associated with M, can be stored in the locations that 
they zero, i.e., A(k + L:n, k). With these changes we obtain the following 
version of (3.2.2): 


Algorithm 3.2.1 (Outer Product Gaussian Elimination) Suppose 
A € IR" *" has the property that A(1:k, 1:k) is nonsingular for k = l:n — 1. 
This algorithm computes the factorization M5. ; --- M,A = U where U is 
upper triangular and each M; is a Gauss transform. JU is stored in the 
upper triangle of A. The muitipliers associated with M, are stored in 
A(K + l:n, k), ie., A(k + lin, k) = —My(k + L:n, k). 


for k= t:n- 1 

rows =kK+1in 

A(rows, k) = A(rows, k)/ A(k, k) 

A(rows,rows) = A(rows,rows) — A(rows, k) A(k, rows) 
end 


This algorithm involves 2n?/3 flops and it is one of several formulations of 
Gaussian Elimination. Note that each pass through the k-loop involves an 
outer product. 


3.2.7 Where is L? 


Algorithm 3.2.3 represents L in terms of the multipliers. In particular, if 
T(À is the vector of multipliers associated with M; then upon termination, 
Alk + im,k) = 7), One of the more happy “coincidences” in matrix 
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computations is that if L = M, ! .-. M, ,, then L(k + i:n, k) = 7“). This 
follows from a careful look at the product that defines L. Indeed, 


L= (tere) res meu) > 23/74 
kui 


Since A(k + I:n, k) houses the kth vector of multipliers 71 , it follows that 
Ali, k) houses Z;, for alli > k. 


3.2.8 Solving a Linear System 


Once A has been factored via Algorithm 3.2.1, then L and [Vare represented 
in the array A. We can then solve the system Ar = b via the triangular 
systems Ly = b and Uz = y by using the methods of §3.1. 


0 1 4 7 
Oo g -3 —6 1 
1 Ü Ü 1 


L d 7 
A x 2 -3à -6 : 
3 2 1 


if è = (1,1, 1)7, then y = (1, —1,0)7 solves Ly = b and z = (—1/3,1/3,0)7. solves 
Uz = y. 


Example 3.2.2 If Algorithm 3.2.1 is applied to 


l 4 T 1 
A= 2 5 8 = 2 
3 6 19 3 


then upon completion, 


Mom © 


3.2.9 Other Versions 


Gaussian elimination, like matrix multiplication, is & triple-loop procedure 
that can be arranged in several ways. Algorithm 3.2.1 corresponds to the 
“kij” version of Gaussian elimination if we compute the outer product 
update row-by-row: 


for k= :n-1 
A(k + lin, k) = A(k + Lin, KY A(KR, k) 
for i=k+1:n 
for j=kK+1:n 
A(i, 7) = Ali, j) — AG, kK) A(, 7) 
end 
end 
end 


There are five other versions: k7i, tkj, ijk, jik, and jki. The last of these 
results in an implementation that features a sequence of gaxpy's and for- 
ward eliminations. In this formulation, the Gauss transformations are not 
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immediately applied to A as they are in the outer product version. Instead, 
their application is delayed. The original A(:, 7) is untouched until step j. 
At that point in the algorithm A(:, 7) is overwritten by M;1---M1A(:, j). 
The jth Gauss transformation is then computed. 

To be precise, suppose 1 € j < n — 1 and assume that L(:,1:j — 1) 
and U(1:j ~ 1,1:j — 1) are known. This means that the first 7 — 1 columns 
of L and U are available. To get the jth columns of L and U we equate 
jth columns in the equation A = LU: A(;j) = LU(:j) From this we 
conclude that | 


A(1:j ~ 1,7) = L(Lj— 1,1: - 1)U(L:j ~ 1,79) 
and . 
j 
Alin, 3) = 3 ^ Lm, k)U (k, 3). 
k=l 


The first equation is a lower triangular system that can be solved for the 
vector U(1:j — 1,7). Once this is accomplished, the second equation can be 
rearranged to produce recipes for U(j,7) and L(j + Lin, j). Indeed, if we 
set 
j-1 

vj) = A(jm.j) - 5 ^ Ln, kU, j) 

k=1 
A{j:n, j) - L(3:n, 1:j — YU (sg — 1,3), 


then L(j + Lin, 7) = v(j + E:n)/v(7) and U(5, j) = v(j). Thus, L(j + Lin, 7) 
is a scaled gaxpy and we obtain 


I 


L=; U =0 
for 7 = l:n 
if j=1 
v(j:n) = A(j:n, å) 
else 


Solve L(1:7 — 1, 1:7 - Dz = A(1:j ~ 1,7) for z (3.2.5) 
and set (1:7 — 1,7} = z. 
v(J:n) = Á(j:n, j) - L(j:n, Lj - 1)z 
end 
if j «n 
L(j + Ln, j) = v(j + L:n)/v(7) 
end 
U(j,j) = v(i) 


end 


This arrangement of Gaussian elimination is rich in forward eliminations 
and gaxpy operations and, like Algorithm 3.2.1, requires 2n?/3 flops. 
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3.2.10 Block LU 


It is possible to organize Gaussian elimination sc that matrix multiplication 
becomes the dominant operation. The key to the derivation of this block 
procedure is to partition A € EC *" as follows 


A= An An r 
Ag An n—r 


r n—T 


where r is a blocking parameter. Suppose we compute the LU factorization 
LU = A and then solve the multiple right hand side triangular systems 
LUi = Ais and LaUyu = A21 for Uia and La respectively. It follows 


that 

Au An — | £n O})/ ft OF] Un Ure 

Az Án L3 bL. 0 A 0 dase 
where A = Ago — La1Uia. The matrix A is the Schur complement of Àj 
with respect to A. Note that if A = LaUn is the LU factorization of A, 


then 

An Aw] [£L Of] & OF | Ua Un 

Ay An La La 0 A 0 Ug 
is the LU factorization of A. Thus, after Li, L21, Vni and Uz2, are com- 
puted, we repeat the process on the level-3 updated (2,2) block A. 


Algorithm 3.2.2 (Block Outer Product LU) Suppose A c E"^ 
and that det( A(1:k, L:k) is nonzero for k = l:n — 1. Assume that r satisfies 
l<r<n. The following algorithm computes A = LU via rank r updates. 
Upon completion, A(i, j) is overwritten with L(i, 7) for i > j and A(3, 7) is 
overwritten with U(i,j) if j > i. 


A=1 
while A <n 
u= min(n,A+r —1} 
Use Algorithm 3.2.1 to overwrite A(A:u, A:u) 
with its LU factors L and U. 
Solve LZ = A(A:u, + i:n) for Z and overwrite 
A(AX:p, p 1:n) with Z. 
Solve WU = A(u + l:n, Azz) for W and overwrite 
Alu + lin, Aru) with W. 
Alu + i:n, + iin) = Alp + Lin, u + lin) - WZ 
A=pt+1 
end 


102 CHAPTER 3. GENERAL LINEAR SYSTEMS 


This algorithm involves 2n*/3 flops. 

Recalling the discussion in $3.1.5, let us consider the level-3 fraction 
for this procedure assuming that r is large enough so that the underlying 
computer is Able to compute the matrix multiply update A(y + t:n, 4 + 
lin} = A(u + lin, p + ln) - WZ at “level-3 speed.” Assume for clarity 
that n = rN. The only Bopa that are not level-3 ops occur in the context 
of the r-by-r LU factorizations A(A:y, A:u) = LU. Since there are N such 
systems solved in the overall computation, we see that the level-3 fraction 
is given by 
N(2r3/3) _ a l 

2nj/3 O  NÀ. 
Thus, for large N almost all arithmetic takes place in the context of matrix 
multiplication. As we have mentioned, this ensures high performance on a 
wide range of computing environments. 


3.2.11 The LU Factorization of a Rectangular Matrix 


The LU factorization of a rectangular matrix A € IR" *" can also be per- 
formed. The m > n case is illustrated by 


12 3] ,|1 0 1 2 3 
45 6}; |4 1 0 -3 -6 
depicta the m < n situation. The LU factorization of A € R™*" is guaran- 
teed to exist i£ A(1:k, 1:k) is nonsingular for k = 1:min(m, n). 
The square LU factorization algorithms above need only minor modifi- 


cation to handle the rectangular case. For example, to handle the m > n 
case we modify Algorithm 3.2.1 as follows: 


for k = lm 
rows =k+1:m 
A(rows,k) = A(rows, k)/ A(k, k) 
ifk<n 
cols = k + lin 
A(rows, cols) = A(rows, cois) — A(rows, k) A(K, cols) 
end 
end 


This algorithm requires mn? — n3/3 flops. 
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3.2.12 A Note on Failure 


As we know, Gaussian elimination fails unless the first n — 1 principal 
submatrices are nonsingular. This rules out some very simple matrices, 


e.g. 
û 1 
am | 10 | 

While A has perfect 2-norm condition, it fails to have an LU factorization 
because it has a singular leading principal submatrix. 

Clearly, modifications are necessary if Gaussian elimination is to be 
effectively used in general linear system solving. The error analysis in the 
following section suggests the needed modifications. 


Problems 


P3.2.1 Suppose the entries of A(c) € R^ *" are continuously differentiable functions of 
the scalar «. Assume that A = A(0) and all its principal submatrices are nonsingular. 
Show that for sufficiently small «, the matrix ÁA(c) has an LU factorization Á(«) = 
L(s)(«) and that Lie) and U(c) are both continuously differentiable. 


P3.2.2 Suppose we partition A c R"*" 
_ | Ai Ai 

oo Ag, A2 | 
where A1, is r-by-r. Assume that Aj) is nonsingular. The matrix S = 433 — An A1, A13 
is called the Schur complement of Ag, in A. Show that if Ay, bas an LU factorization, 
then after r steps of Algorithm 3.2.1, A(r + l:n,r + l:n) houses S. How could 5 be 
obtained after r steps of (3.2.5)? 
P3.2.3 Suppose A c E" *" has an LU fectorization. Show how Az = b can be solved 
without storing the multipliers by computing the LU factorization of the n-by-(n + 1) 
matrix [A b]. 
P3.2.4 Describe a variant of Gaussian elimination that introduces zeros into the columns 
of A in the order, n: — 1:2 and which produces the factorization A = UL where U is unit 
upper triangular and L is lower triangular. 
P3.2.5 Matrices in R^ *" of the form N(y, k) = I1— yer where y € R” are said to 
be Gauss-Jordan transformations. (a) Give a formula for N(y, k)? aiming it exista. 
(b) Given z € R", under what conditions can y be found eo N (y, k)z = ey? (c) Give 
an algorithm using Gauss-Jordan transformations that overwrites A with A-!. What 
conditions on A ensure the success of your algorithm? 


P3.2.6 Extend (3.2.5) so that it can also handle the case when A has more rows than 
columnas. 


P3.2.7 Show how A can be overwritten with L and [U in (3.2.5). Organize the three 
loops so that unit stride access prevails, 

P3.2.8 Develop a version of Gaussian elimination in which the innermost of the three 
loops overseas a dot product. 
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Notes and References for Sec. 3.2 


Schur complements (P3.2.2) arise in many applications. For a survey of both practical 
and theoretical interest, see 


R.W. Cottle (1974). “Manifestations of the Schur Complement,” Lin. Alg. and Its 
Applic. 8, 189-211. 


Schur complements are known ag "Gauss transforms” in some application areas. The 
use of Gauss-Jordan transformations (P3.2.5) ia detailed in Fox (1964). See also 


T. Dekker and W. Hoffman (1989). “Rehabilitation of the Gauss-Jordan Algorithm,” 
Numer. Math. 54, 591—599. 


As we mentioned, inner product versions of Gaussian elimination have been known and 
used for some time. The names of Crout and Doolittle are associated with these ijk 
techniques. They were popular during the days of desk calculators because there are 
fat fewer intermediate results than in Gaussian elimination. These methods still have 
attraction because they can be implemented with accumulated inner products. For re- 
marks along these lines see Fox (1964) as well as Stewart (1973, pp. 131—39). See also: 


G.E. Forsythe (1960). “Crout with Pivoting,” Comm. ACM 3, 507-3. 
W.M. McKeeman (1962). "Crout with Equilibration and Iteration,” Comm. ACM. 5, 
553-55. 


Loop orderings and block issues in LU computations are discussed in 


J.J. Dongarra, F.G. Gustavson, and A. Karp (1984). “Implementing Linear Algebra 
Algorithms for Dense Matrices on a Vector Pipeline Machine,” SIAM Review 26, 
91-112. 

J.M. Ortega (1988). “The ijk Forms of Factorization Methods I: Vector Computers,” 
Parallel Computers 7, 135—147. 

D.H. Bailey, K.Lee, and H.D. Simon (1991). “Using Strassen's Algorithm to Accelerate 
the Solution of Linear Systems," J. Supercomputing 4, 357—371. 

J.W. Demmel, N.J. Higham, and RS. Schreiber (1995). “Stability of Block LU Factor- 
ization,” Numer. Lin. Alg. with Applic. 2, 173-190. 


3.3  Roundoff Analysis of Gaussian Elimina- 
tion 


We now assess the effect of rounding errors when the algorithms in the 
previous two sections are used to solve the linear system Ar =b. A much 
more detailed treatment of roundoff error in Gaussian elimination is given 
in Higham (1996). 

Before we proceed with the analysis, it is useful to consider the nearly 
ideal situation in which no roundoff occurs during the entire solution process 
except when A and b are stored. Thus, if fi(b) = b+e and the stored matrix 
fi(A) = A + E is nonsingular, then we are assuming that the computed 
solution £ satisfies 

(A+ B)E=(b+e) | Bllo £ ull Allo lel £ ull blo. (3.3.1) 
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That is, = solves a "nearby" system exactly. Moreover, if ux..(A) € § 
(say), then by using Theorem 2.7.2, it can be shown that 


| z —5 lloo 


ll = lleo 


The bounds (3.3.1) and (3.3.2) are “best possible” norm bounds. No general 
co-norm error analysis of a linear equation solver that requires the storage of 
A and b can render sharper bounds. As a consequence, we cannot justifiably 
criticize an algorithm for returning an inaccurate £ if A is ill-conditioned 
relative to the machine precision, e.g., UX_.{A) 2: 1. 


€ dux lA). (3.3.2) 


3.3.1 Errors in the LU Factorization 


Let us see how the error bounds for Gaussian elimination compare with 
the ideal bounds above. We work with the infinity norm for convenience 
and focus our attention on Algorithm 3.2.3, the outer product version. 
The error bounds that we derive also apply to Algorithm 3.2.4, the gaxpy 
formulation. 

Our first task is to quantify the roundoff errors associated with the 
computed triangular factors. 


Theorem 3.3.1 Assume that A is an n-by-n matriz of floating point num- 
bers. If no zero pivots are encountered during the execution of Algorithm 
3.2.3, then the computed triangular matrices L and Ü satisfy 


~ 


LU = A+H (3.3.3) 
|H| < 3(n— 1)u (LA + LED) + Olu’). (3.3.4) 


Proof. The proof is by induction on n. The theorem obviously holds for 
n — 1. Assume it holds for all (n — 1)-by-(n - 1) floating point matrices. If 


T 

a w l 

A= b A n-—1 
la-l 


then 2 = fl(v/o) and A, = fl(B — 2wT) are computed in the first step of 
the algorithm. We therefore have 


z= av + f flea (3.3.5) 


Aje 


and 


Å = B—iw™+F |F| € 2u(jBl 4+ |êllw|T) + O(u?). — (3.3.6) 
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The algorithm now proceeds to calculate the LU factorization of Ay. By 
induction, we compute approximate factors £, and U, for A, that satisfy 


Ll ^^ 


LQGUU,-2 ÅA +M (3.3.7) 
|E S 3(n~2)u(JAi| + all) + Ota?) (3.3.8) 
Thus, 
4 x 1 O a w^ 
xdi |; ll à. | 
0 0 
= A+| d EP = A+H 


From (3.3.6) it follows that 
|Â} S (1 + 2u) (|B| + |2i}wl?) + Olu’), 
and therefore by using (3.3.7) and (3.3.8) we have 
IH, +F] S 3(n - Du (B+ lllol7 120.) + O03. 


Since jaf| < ujel it is easy to verify that 


PEINT. H 0 jpe u|? | 
alse i BI | + Lia tajl o ou | =e 


thereby proving the theorem. O 


We mention that if A is m-by-n, then the theorem applies with n in (3.3.4) 
replaced by the smaller of n and m . 


3.3.2 Triangular Solving with Inexact Triangles 


We next examine the effect of roundoff error when L and Ü are used by the 
triangular system solvers of 83.1. 


Theorem 3.3.2 Let L and Ü be the computed LU factors of the n-by-n 
floating point matriz A obtained by either Algorithm 3.2.3 or 3.2.4. Suppose 
the methods of 83.1 are used to b the computed solution j to Ly = b 
and the computed solution 2 to Ux = 3. Then (A+ E) =b with 


|E| € nu (3A + 51501) + Olu’). (3.3.9) 
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Proof. From (3.1.1) and (3.1.2) we have 


(L- Fu = b |F] € nul£| + O(u?) 

(Ü +G) = y IG] € nulU] O(u?) 
and thus 

(L4 F)(Ü--G)$ = (LU + FÜ + ÌG + FG)z =b. 
From Theorem 3.3.1 


LU -A4H, 
with |H| € 3(n - 1)u(|A] + 1][U]) + O(u?), and so by defining 
E = H+FU+IG4iFG 
we find (A + £)z = b. Moreover, 
|E] |H] + FEL IO] + ÊI IG] + O(u?) 
3nu (141 + LO1) + 2nu (12001) + O(u?). 0 


IA IA 


Were it not for the possibility of a large |L||U! term, (3.3.9) would compare 
favorably with the ideal bound in (3.3.1). (The factor n is of no conse- 
quence, cf. the Wilkinson quotation in §2.4.6.) Such a possibility exists, for 
there is nothing in Gaussian elimination to rule out the appearance of small 
pivots. If a small pivot is encountered, then we can expect large numbers 
to be present in Ê and U. 

We stress that small pivots are not necessarily due to ill-conditioning as 


: : bears out. Thus, Gaussian elimination can give 


arbitrarily poor results, even for well-conditioned problems. The method is 
unstable. ' 

In order to repair this shortcoming of the algorithm, it is necessary to 
introduce row and/or column interchanges during the elimination process 
with the intention of keeping the numbers that arise during the calculation 
suitably bounded. This idea is pursued in the next section. 


the example A = 


Example 3.3.1 Suppose A = 10,t = 3, floating point arithmetic ia used to solve: 


o1 100][2] [190 , 
1.00 200 || z2] ~ | 3.00 J° 


Applying Gaussian elimination we get 


RUP se. 40 » [400 1 
t= [ix T 0 [7 zn] 
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-a 
Moreover, 6 | de Pes | is the bounding matrix in (3.3.4), not a severe overesti- 


mate of [H|. If we go on to solve the problem using the triangular system solvers of $3.1, 
then using the same precision arithmetic we obtain a computed solution 2 = (0, 1. 
This is in contrast to the exact solution z = (1.002...,.998...)7, 


Problems 


P3.3.1 Show that if we drop the assumption that A is a floating point matrix in 
Theorem 3.3.1, then (3.3.4) hoida with the coefficient “3” replaced by “4.” 
P3.3.2 Suppose A is an n-by-n matrix and that L and U are produced by Algorithm 
3.2.1. (a) How many fopa are required to compute ILEI lao? (b) Show FILICI < 
(1 + 2nu)|£J|U] + O(u?). 
P3.3.3 Suppose r = A~'b. Show that if e = z — £ (the error) and r = b — Az (the 
residual), then 

ir il = 

way S fell S HAT Ud. 

lA] | 
Assume consistency between the matrix and vector norm. 
P3.3.4 Using 2-digit, base 10, floating point arithmetic, compute the LU factorization 


of 
7 6 
As]; de 


For this example, what is the matrix H in (3.3.3)? 


Notes and References for Sec. 3.3 
The original roundoff analysis of Gaumian elimination appears in 


J.H. Wükinson (1961). "Error Analysis of Direct Methods of Matrix Inversion," J. ACM 
8, 281-330. 


Various improvements in the bounds and simplifications in the analysis have occurred 
over the years. See 


B.A. Chartres and J.C. Geuder (1967). “Computable Error Bounds for Direct Solution 
of Linear Equations,” J. ACM 14, 63-71. 

J.K. Reid (1971). “A Note on the Stability of Gaussian Elimination," J. inst. Math 
Applic. 8, 374-T5. 

C.C. Paige (1973). "An Error Analysis of a Method for Solving Matrix Equations,” 
Math. Comp. £7, 355-59. 

C. de Boor and A. Pinkus (1977). “A Backward Error Analysis for Totally Positive 
Linear Systems," Numer. Math. 27, 485-90. 

H.H. Robertson (1977). "The Accuracy of Error Estimates for Systems of Linear Alge- 
braic Equations,” J. inst. Math. Applic. 20, 409-14. 

J.J. Du Cros and N.J. Higham (1992). “Stability of Methods for Matrix Inversion,” [MA 
J. Num. AnaL 12, 1--19. 
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3.4 Pivoting 


The analysis in the previous section shows that we must take steps to ensure 
that no large entries appear in the computed triangular factors £ and U. 
The example 


0001 1 1 O)f 0001 1 
A= | 1 TESI 0 Leo | = EU 


correctly identifies the source of the difficulty: relatively amall pivots. A 
way out of this difficulty is to interchange rows. In our example, if P is the 


permutation 
0 1 
"- [1o] 
then 


Dod 1 oyf1 1 
PA * | a 1 | 7 | oon La 959] 7 22. 


Now the triangular factors are comprised of acceptably small elements. 

In this section we show how to determine a permuted version of À that 
has a reasonably stable LU factorization. There are several ways to do 
this and they each correspond to a different pivoting strategy. We focus 
on partial pivoting and complete pivoting. The efficient implementation 
of these strategies and their properties are discussed. We begin with a 
discussion of permutation matrix manipulation. 


3.4.1 Permutation Matrices 


The stabilizations of Gaussian elimination that are developed in this sec- 
tion involve data movements such as the interchange of two matrix rows. 
In keeping with our desire to describe all computations in "matrix terms," 
it is necessary to acquire a familiarity with permutation matrices. A per- 
mutation matrix is just the identity with its rows re-ordered, e.g., 


Àn n-by-n permutation matrix should never be explicitly stored. It is much 
more efficient to represent a general permutation matrix P with an integer 
n-vector p. One way to do this is to let p(k) be the column index of the 
sole ^1" in P's kth row. Thus, p — [4132] is the appropriate encoding of 
the above P. It is also possible to encode P on the basis of where the “1” 
occurs in each column, e.g., p = [2 431]. 
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If P is a permutation and A is a matrix, then PA is a row permuted 
version of A and AP is a column permuted version of A. Permutation 
matrices are orthogonal and so if P is a permutation, then P^! = PT. A 
product of permutation matrices is a permutation matrix. 

In this section we are particularly interested in interchange permuta- 
tions. These are permutations obtained by merely swapping two rows in 
the identity, e.g., 


—- OOS 
D Om- & 
D å. & & 
Ooo ore 


Interchange permutations can be used to describe row and column swap- 
ping. With the above &by-4 example, EA is A with rows 1 and 4 inter- 
changed. Likewise, AE is A with columns 1 and 4 swapped. 

If P = En- Ej and each Ey is the identity with rows k and p(k) 
interchanged, then p(1:n) is a useful vector encoding of P. Indeed, z € IR” 
can be overwritten by Px as follows: 


for k =1:n 


x(k) + x(p(k)) 


end 


Here, the “++” notation means “swap contents.” Since each Ek is symmetric 
and PT = E,... En, the representation can also be used to overwrite z with 
Plz: 


for k=n:-1:1 


x(k) + z(p(k)) 


end 


It should be noted that no floating point arithmetic is involved in a permu- 
tation operation. However, permutation matrix operations often involve the 
irregular movement of data and can represent a significant computational 
overhead. 


3.4.2 Partial Pivoting: The Basic Idea 


We show how interchange permutations can be used in LU computations to 
guarantee that no multiplier is greater than one in absolute value. Suppose 


3 17 10 
A-2|2 4 -2]. 


6 18 -12 
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To get the smallest possible multipliers in the first Gauss transform using 
row interchanges we need a;; to be the largest entry in the first column. 
Thus, if E; is the interchange permutation 


0 0 1 
EÉ-|010 
10 0 


then 
6 18 -12 
Eå = 2 4 -—2 
3 17 10 
and 
1 0 0 6 18 -12 
M, = —1/3 1] 0 => MiB A = 0 -2 2 : 
-1/2 0 1 0 8 16 


Now to get the smallest possible multiplier in M4 we need to swap rows 2 
and 3. Thus, if 


1009 1 0 O0 
E = 0 0 i and Mz = 0 1 0 
01 0 0 1/4 1 


6 18 -12 
Maka M EA = 0 8 16 


then 


0 0 6 


The example illustrates the basic idea behind the row interchanges. In 
general we have: 


for k = t:n—-—1 
Determine an interchange matrix Ey with £,(1:k, 1:k) = Ix 
such that if z is the kth column of ELA, then 
|z(k)]| = || z(km) loo- 
A=E,A 
Determine the Gauss transform M, such that if v is the 
kth column of MA, then v(k + lin) z 0. 
A=MLA 
end 


This particular row interchange strategy is called partial pivoting. Upon 
completion we emerge with Mn-1En-1 ° M;E,À = U, an upper triangu- 
lar matrix. 
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As & consequence of the partial pivoting, no multiplier is larger than 
one in absolute value. This is because 


KG Mk-i c MUELAN = may (Er My_1--- Mi ELA) 
S15. 


for k = 1:n — 1. Thus, partial pivoting effectively guards against arbitrarily 
large multipliers. 


3.4.3 Partial Pivoting Details 


We are now set to detail the overall Gaussian Elimination with partial piv- 
oting algorithm. 


Algorithm 3.4.1 (Gauss Elimination with Partial Pivoting) If 
AcR"'*", then this algorithm computes Gauss transforms Mj,--- Ma. 
and interchange permutations £,,--:£,.; such that M4 Eqs 1: -- M EA 
= U is upper triangular. No muitiplier is bigger than 1 in absolute value. 
A(1:k, k) is overwritten by U(1:k, k), k = lin. A(k + En, k) is overwritten 
by —M,(k + l:n,k), k = lin — I. The integer vector p(1:n — 1) defines 
the interchange permutations. In particular, E, interchanges rows k and 
p(k), k = Ln - 1. 


for k =1:in-1 
Determine p with k < p < n so |A(u, k)| = || A(kin, k) [loo 
A(k, kin} = Alu, k:n) 
p(k) =p 
if A(k,k) #0 
rows =k +1:n 
A(rows, k) = A(rows, k)/A(k, k) 
A(rows, rows) = A(rows, rows) — A(rows, k)A(k, rows) 
end 
end 


Note that if || A(&k:n, k) ||. = 0 in step k, then in exact arithmetic the first 
k columns of A are linearly dependent. In contrast to Algorithm 3.2.1, this 
poses no difficulty. We merely skip over the zero pivot. 

The overhead associated with partial pivoting is minimal from the stand- 
point of floating point arithmetic as there are only O(n^) comparisons asso- 
ciated with the search for the pivots. The overall algorithm involves 2n? /3 
flops. 

To solve the linear system Ar = b after invoking Algorithm 3.4.1 we 


. Compute y= Ma LES. ME 16. 
e Solve the upper triangular system Uz = y. 
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All the information necessary to do this is contained in the array A and the 
pivot vector p. Indeed, the calculation 
for k 1:in-1 


b(k) ++ b(p(K)) 
b{k + 1:n) = b(k + lin) — b(K)A(Kk + 1:n, k) 
end 


overwrites b with M4, ,E4-,--- MLE,. 
Example 3.4.1 if Algorithm 3.4.1 is applied to 
3 17 10 
A= 2 4 ej, 
6 18 -—12 


then upon exit 
6 18 -12 
A= 1/3 8 16 
1/2 -1/4 6 


and p = [3, 3]. These two quantities encode all the information associated with the 
reduction: 


1 0 0 100 100 0 0 1 6 18 -12 
0 1 0 0 0 1 -1/3 1 0 o 10|As|o 8 16]. 
0 1/4 1 0 1 0 -1/2 0 1 1 0 0 0 0 6 


3.4.4 Where is L? 


Gaussian elimination with partial pivoting computes the LU factorization of 
à row permuted version of A. The proof is a messy subscripting argument. 


Theorem 3.4.1 If Gaussian elimination with partial pivoting is used to 
compute the upper triangularization 


Ma aEa icc MEA = U (3.4.1) 


via Algorithm 3.4.1, then 

PA= LU 
where P = E4 1--- E and L is a unit lower triangular matriz with |£| < 
l. The kth column of L below the diagonal is a permuted version of the 
kth Gauss vector. In particular, if My = I — rP eT, then L(k + l:n, k) = 
g(k + 1:n) where g = En-1 e Ergr 9. 


Proof. A manipulation of (3.4.1) reveals that M4, .-- MjPA = U where 
M,-1 = Mga-1 and : 


Mk = Enie Eki MkEkyi CES) Sn 2. 
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Since each E; is an interchange permutation involving row j and a TOW j4 
with u > 7 we have &,(1:7 —1,1:7 — 1) = Jj-1 . It follows that each M; is 
a Gauss transform with Gauss vector 7) = E,_)---Eg4,7*). O 


As a consequence of the theorem, it is easy to see how to change Algorithm 
3.4.1 so that upon completion, A(i,j) houses L(i,j) for all i > j. We 
merely apply each E, to all the previously computed Gauss vectors. This 
is accomplished by changing the line “A{k, kin) + Alu, ken)” in Algorithm 
3.4.1 to “A(k, lin) = Alu, L:n)." 


Example 3.4.2 The factorization PA = LU of the matrix in Example 3.4.1 is given by 


001 3 17 10 1 0 0 6 18 -—12 
10 Q0 2 4 -2|-5|1i 1 0 0 8 16 |. 
0 1 0 6 18 -12 1/3 -1/4 1 0 0 6 


3.4.5 The Gaxpy Version 


Tn §3.2 we developed outer product and gaxpy schemes for computing the 
LU factorization. Having just incorporated pivoting in the outer product 
version, it is natural to do the same with the gaxpy approach. Recall from 
(3.2.5) the general structure of the gaxpy LU process: 


Log 
= 0 
for j = Ln 
ifj=1 
v(j:n) = A(j:n, 7) 
else 
Solve L(1:j — 1, 1:7 — 1)z = A(1:j — 1, 7} for z 
and set U(1:j — 1, j} = z. 
v(j:n) = À(j:n, 3) — L(3:n, 1:4 — 1)2 
end 
ifj<cn 
L(j + Lin, j) = v(j + L:n)/u(3) 
end 
U({j, j) = v) 
end 


With partial pivoting we search |v(j:n)| for its maximal element and pro- 
ceed accordingly. Assuming A is nonsingular so no zero pivots are encoun- 
tered we obtain 
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L=I;,0=0 
for 7 = Ln 
if j=1 
u(jin) = A(j:n, 7) 
else 
Solve L(1:7 — 1, 1:7 — 1)z = A(13 — 1,7) 
for z and set U(1:7 — 1,7) = z. 
vijn) = Afzn,7) — L(g, 117 — 1)z 
end (3.4.2) 
ifj<n 
Determine js with k € u € n so [v(u)] = l| vin) ||... 
p(j)-n 
v(5) = ví) 
Á(j,3 + lin) + Afu, 7 + Lr) 
L(j + lin, j) = v(j + En)/v(7) 
ijl 
L(j,1:3 — 1) ^ Llp, 1:5 — 1) 
end 
end 
Ul j) = v(3) 


end 


In this implementation, we emerge with the factorization PA = LU where 
P = E,..1:::E, where E, is obtained by interchanging rows k and p(k) of 
the n-by-n identity. As with Algorithm 3.4.1, this procedure requires 2n? /3 
flops and O(n?) comparisons. 


3.4.6 Error Analysis 

We now examine the stability that is obtained with partial pivoting. This 
requires an accounting of the rounding errors that are sustained during 
elimination and during the triangular system solving. Bearing in mind 
that there are no rounding errors associated with permutation, it is not 
hard to show using Theorem 3.3.2 that the computed solution z satisfies 
(A+ Eż = b where 


|E] s nu(3|Aj + SPT|E0|) + O(n). (3.4.3) 


Here we are assuming that P, L, and Ü are the computed analogs of P, 
L, and U as produced by the above algorithms. Pivoting implies that the 
elements of L are bounded by one. Thus | Lie <n and we obtain the 
bound 


lE loo < nu (3H Allo + Snil Ô lo) + O(u?). (3.4.4) 
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The problem now is to bound || Ù llo. Define the growth factor p by 
lay; | 
at 3.4.5) 
ijk lA [hoo l 
where A‘) is the computed version of the matrix A) = Mk Ep- M Eh A. 
It follows that 
| Ella € 8n*gl Allou + O(u?). (3.4.6) 


Whether or not this compares favorably with the ideal bound (3.3.1) hinges 
upon the size of the growth factor of p. (The factor n* is not an operating 
factor in practice and may be ignored in this discussion.) The growth factor 
measures how large the numbers become during the process of elimination. 
In practice, p is usually of order 10 but it can also be as large as 2^7 !. De 
spite this, most numerical analysts regard the occurrence of serious element 
growth in Gaussian elimination with partial pivoting as highly unlikely in 
practice. The method can be used with confidence. 


Example 3.4.3 If Gaussian elimination with partial pivoting is applied to the problem 
-001 ala] = ee 


1.00 200 | | z2 3.00 
with 8 = 10,t = 3, floating point arithmetic, then 
0 1 ; 100 0 - 1.00 2.00 
Pe) pss 001 Lo | =| 0 1.00 


and # = (1.00, .996)7. Compare with Example 3.3.1. 


Example 3.4.2 If A € R®*" is defined by 
1 fi=jorj=n 
-1 ifi>j 
0 otherwise 
then A has an LU factorization with |¢;;] € 1 and unn = 2"7!. 


aj = 


3.4.7 Block Gaussian Elimination 


Gaussian Elimination with partial pivoting can be organized so that it is 
rich in level-3 operations. We detail a block outer product procedure but 
block gaxpy and block dot product formulations are also possible. See 
Dayde and Duff (1988). 

Assume A € IR" *" and for clarity that n 2 rN. Partition A as follows: 


a= | 4u An r 
Asi Ags nr 


r n-r 
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The first step in the block reduction is typical and proceeds as follows: 


e Use scalar Gaussian elimination with partial pivoting (e.g. a rectan- 
gular version of Algorithm 3.4.1) to compute permutation P, € I *", 
unit lower triangular Lj; € R'** and upper triangular U); € IR'*" so 


Aun | _ | Lu 
KA S PAL 


e Apply the P, across the rest of A: 


e Solve the lower triangular multiple right hand side problem 
LuUi = Áy. 
e Perform the level-3 update 
A = Ag - Lai. 


With these computations we obtain the factorization 
| Lu 0 h 0 Uiu Ui 
ia i l La Jn-r | | 0 A 0 Te 
The process is then repeated on the first r columns of A. 

In general, during step k (1 € k < N — 1) of the block algorithm we 
apply scalar Gaussian elimination to a matrix of size (n — (k — 1)r)-by-r. 
An r-by-(n — kr) multiple right hand side system is solved and a level 3 
update of size (n — kr)-by-(n — kr) is performed. The level 3 fraction for 


the averall process is approximately given by 1 — 3/(2N). Thus, for large 
N the procedure is rich in matrix multiplication. 


3.4.8 Complete Pivoting 


Another pivot strategy called complete pivoting has the property that the 
associated growth factor bound is considerably smaller than 2"—!. Recall 
that in partial pivoting, the kth pivot is determined by scanning the current 
subcolumn A(k:n,*). In complete pivoting, the largest entry in the cur- 
rent submatrix A({k:n, k:n) is permuted into the (k, k) position. Thus, we 
compute the upper triangularization Mai E, 1--- M E1 AFi e Phn- = U 
with the property that in step k we are confronted with the matrix 


A-D) = M, LE, a MEAP FRA 
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and determine interchange permutations E, and Fp such that 


(aaron) | = pax 


kk kei jn 


(BAe DF) IT 


ij 


We have the analog of Theorem 3.4.1 


Theorem 3.4.2 Jf Gaussian elimination with complete pivoting is used to 
compute the upper triangularization 


Ma-1En-1 0 MEAR «Fao. = U (3.4.7) 


then 
PAQ = LU 


where P = En-1: ° El , Q = Fi--- Fn- and L is a unit lower triangular 
matriz with |£,;| € 1. The kth column of L below the diagonal is a permuted 
version of the kth Gauss vector. In particular, if My = I — rel then 
L(k + lin, k) = g(k + Lin) where g = En-1 7 Exgit™ . 


Proof. The proof is similar to the proof of Theorem 3.4.1. Details are left 
to the reader. O 


Here is Gaussian elimination with complete pivoting in detail: 


Algorithm 3.4.2 (Gaussian Elimination with Complete Pivoting) 
This algorithm computes the compiete pivoting factorization PAQ = LU 
where L is unit lower triangular and U is upper triangular. P = E4.,--- Ei 
and Q = Fj-.. F4-; are products of interchange permutations. A(1:k, k) 
is overwritten by U(1:k, k), k = Lin. A(k + i:n, k) is overwritten by L(k + 
Ln,k)k = iin — 1. E, interchanges rows k and p(k). PF, interchanges 
columns k and q(k). 


for k=1:n-1 
Determine j with k < p € n and à with k < À £ n so 
| A(z, A) = max{ [A(1, 3)| : i = kin, 2 = kn) 
A(k, i:n) = A(p, 1:n) 
A(L:n, k) = A(1:n, A) 
pík) =p 
q(k) =À 
if A(k, k) #0 
rows =k + i:n 
A{rows, k) = A(rows, k)/A(k, k) 
A(rows, rows) = A(rows, rows) — A(rows, k).A(k, rowa) 
end 
end 
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This algorithm requires 2n*/3 flopa and O{n*) comparisons. Unlike partial 
pivoting, complete pivoting involves a significant overhead because of the 
two-dimensional search at each stage. 


3.4.9 Comments on Complete Pivoting 


Suppose rank(A) =r < n. It follows that at the beginning of step r + 1, 
A(r--1:n,r--1:n) =0. This implies that E, = Fy = My = I for k =r+lin 
and so the algorithm can be terminated after step r with the following 
factorization in hand: 


BH _ | Lu 0 Un Up 
PAQ = LU = | 7 Ec il 0 He 


Here Li and Uii are r-by-r and La and UT, are (n = r)-by-r. Thus, 
Gaussian elimination with complete pivoting can in principle be used to 
determine the rank of a matrix. Yet roundoff errors make the probability 
of encountering an exactly zero pivot remote. In practice one would have to 
“declare” A to have rank k if the pivot element in step k-- 1 was sufficiently 
small. The numerical rank determination problem is discussed in detail in 
$5.4. 

Wilkinson (1961) has shown that in exact arithmetic the elements of 
the matrix AU) = ME; .-- MLE,AF, --- Fy satisfy 


lat? | < k32. 32. MET) maxla,, |. (3.4.8) 


The upper bound is a rather slow-growing function of k. This fact coupled 
with vast empirical evidence suggesting that p is always modestly sized (e.g, 
p = 10) permit us to conclude that Gaussian elimination with complete 
pivoting is stable. The method solves a nearby linear system (A+ E)z = b 
exactly in the sense of (3.3.1). However, there appears to be no practical 
justification for choosing complete pivoting over partial pivoting except in 
cases where rank determination is an issue. 


Example 3.4.8 If Gaussian elimination with complete pivoting is applied to the prob- 


lem 
01 1.00 zl - 1.00 
100 2.00 4 3.00 
with @ = 10,t = 3, floating arithmetic, then 


0 1 [9 1 100 0.00 - _ [200 L00 
P=; al e= ji ale La | e 1.00 |' b= | oo y 


and 2 = [1.00, 1.00]7. Compare with Examples 3.3.1 and 3.4.3. 


3.4.10 The Avoidance of Pivoting 


For certain classes of matrices it is not necessary to pivot. It is important 
to identify such classes because pivoting usually degrades performance. To 
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illustrate the kind of analysis required to prove that pivoting can be safely 
avoided, we consider the case of diagonally dominant matrices. We say that 
A € FC *" is strictly diagonally dominant if 


lasi > $ las] ^ i2lm. 
j*-i 


isi 


The following theorem shows how this property can ensure a nice, no- 
pivoting LU factorization. 


Theorem 3.4.3 If AT is strictly diagonally dominant, then A has an LU 
factorization and ]l;;| € 1. In other words, if Algorithm 3.4.1 is applied, 
then P= I. 


Proof. Partition À as follows 
| | @ wT 
asa t| 


where a is 1-by-1 and note that after one step of the outer product LU 
process we have the factorization 


[So] = (uo t][o e-mnedlo 7] 


The theorem follows by induction on n if we can show that the transpose 
of B = C —vwT /a is strictly diagonally dominant. This is because we may 
then assume that B has an LU factorization B = LU, and that implies 


- [4 t] x] en 


But the proof that BT is strictly diagonally dominant is straight forward. 
From the definitions we have 


n=] n-li 

w 
Y^ ley -uwal € Y lel + uj S PX ii 
i ym] 


n-1 


$ [bas] 


im] im] 
z m 3 z 
Iw;| 
S (led — lwl) + | at ~ (lal — |v;l) 
WV 
S | j- Pea = b. 
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3.4.11 Some Applications 


We conclude with some examples that illustrate how to think in terms of 
matrix factorizations when confronted with various linear equation situa- 
tions. 

Suppose A is nonsingular and n-by-n and that B is n-by-p. Consider the 
problem of finding X (n-by-p) so AX = B, ie., the multiple right hand side 
problem. If X = [z1,..., r9] and B =[&,...,6,] are column partitions, 
then 


Compute PA = LU. 


for k = lip 
Solve Ly = Phy (3.4.9) 
Solve Ur, = y 

end 


Note that A is factored just once. If B = J, then we emerge with a 
computed A`! , 

As another example of getting the LU factorization “outside the loop,” 
suppose we want to solve the linear system A*r = b where A € E *", 
b € IR", and k is a positive integer. One approach is to compute C = A* 
and then solve Cz = b. However, the matrix multiplications can be avoided 


altogether: 


Compute PA = LU 

for ;— 1:k 
Overwrite 6 with the solution to Ly = Pb. (3.4.10) 
Overwrite b with the solution to Uz = b. 

end 


As a final example we show how to avoid the pitfall of explicit inverse 
computation. Suppose we are given A c R"*", d c R^, and c c R” and 
that we want to compute s — cT A- id. One approach is to compute X = 
A7! as suggested above and then compute s = cT Xd. A more economical 
procedure is to compute PA = LU and then solve the triangular systems 
Ly = Pd and Uz = y. It follows that s = c/ x. The point of this example is 
to stress that when a matrix inverse is encountered in & formula, we must 
think in terms of solving equations rather than in terms of explicit inverse 
formation. 


Problema 


P3.4.1 Let A = LU be the LU factorization of n-by-n A with jé;;| < 1. Let at and uT 
denote the ith rows of A and U, respectively. Verify the equation 
i-1 
u = ~ 2. tug 


j=l 
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and use it to show that || U loo < 2" 714A foo . (Hint: Take norms And use induction.) 


P3.4.2 Show that if PAQ = LU is obtained via Gaussian elimination with complete 
pivoting, then no element of Uti, in) is larger in absolute value than [u[. 

P3.4.3 Suppose A c R**" bas an LU factorization and that L and U are known. Give 
an algorithm which can compute the (i, j) entry of A7! in approximately (n—j)?--(n—i)? 
flops. 

P3.4.4 Suppose X is the computed inverse obtained via (3.4.9). Give an upper bound 
for || AX — I lig. 

P3.4.5 Prove Theorem 3.4.2. 

P3.4.6 Extend Algorithm 3.4.3 so that it can factor an arbitrary rectangular matrix. 


P3.4.7 Write a detailed version of the block elimination algorithm outlined in §3.4.7. 
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these procedures and their implementation see 


K. Gallivan, W. Jalby, U. Meier, and A.H. Sameh (1988). “Impact of Hierarchical Mem- 
ory Systems on Linear Algebra Algorithm Design," Int'l J. Supercomputer Applic. 
2, 12-48. 


3.5 Improving and Estimating Accuracy 


Suppose Gaussian elimination with partial pivoting is used to solve the n- 
by-n system Ar = b. Assume t-digit, base 6 floating point arithmetic is 
used. Equation (3.4.6) essentially says that if the growth factor is modest 
then the computed solution £ satisfies 


(A-E&-5 [Ello Mull Allo, u-z4*. (351) 


In this section we explore the practical ramifications of this result. We begin 
by stressing the distinction that should be made between residual size and 
accuracy. This is followed by a discussion of scaling, iterative improvement, 
and condition estimation. See Higham (1996) for a more detailed treatment 
of these topics. 

We make two notational remarks at the outset, The infinity norm is used 
throughout since it is very handy in roundoff error analysis and in practical 
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error estimation. Second, whenever we refer to “Gaussian elimination" in 
this section we really mean Gaussian elimination with some stabilizing pivot 
strategy such as partial pivoting. 


3.5.1 Residual Size Versus Accuracy 


The residual of a computed solution Z to the linear system Az = b is the 
vector b — Az. A small residual means that At effectively “predicts” the 
right hand side b. From (3.5.1) we have || b — AZ]; = ull A flooll = lloc 
and so we obtain 


Heuristic I. Gaussian elimination produces a solution $ with a relatively 
small residual. 


Small residuals do not imply high accuracy. Combining (3.3.2) and (3.5.1), 
we see that 
| i-r loo 
| = lloc 
This justifies a second guiding principle. 
Heuristic II. If the unit roundoff and condition satisfy u z 1077 and 
K,,(A) = 107, then Gaussian elimination produces a solution 7 that 
has about d — g correct decimal digits. 


If uk (A) is large, then we say that A is ill-conditioned with respect to 
the machine precision. 
As an illustration of the Heuristics Í and II, consider the system 


986 .579 Tı = .235 

.409  .237 za | | 107 
in which x,..(A) = 700 and z = (2, —3)7. Here is what we find for various 
machine precisions: 


RY U^oo(À). (3.5.2) 


ê- z ljo | ib- A2 læ 
| © ioo l| A lkoll £ Ilco 
2 


Whether or not one is content with the computed solution = depends on 
the requirements of the underlying source problem. In many applications 
accuracy is not important but small residuals are. In such a situation, the 
£ produced by Gaussian elimination is probably adequate. On the other 
hand, if the number of correct digits in = is an issue then the situation 
is more complicated and the discussion in the remainder of this section is 
relevant. 
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3.5.2 Scaling 

Let § be the machine base and define the diagonal matrices D, and Do by 
D, = diag(g"'...B"-) 
D, = diag(g^...8**). 


The solution to the n-by-n linear system Az = b can be found by solving 
the scaled system (D; !AD3)y = Dj !b using Gaussian elimination and 
then setting z = Day. The scalings of A, b, and y require only O(n?) flops 
and may be accomplished without roundoff. Note that D; scales equations 
and Dy scales unknowns. 

[t follows from Heuristic II that if 2 and j are the computed versions of 
z and y, then : 


IDI E-a) le _ lü-vle ye (p-14 
l| Ditz tle ly le Wkqo(D,AD2). (3.5.3) 


Thus, if & (Dt! AD3) can be made considerably smaller than «,,(A), then 
we might expect a correspondingly more accurate Z, provided errors are 
measured in the “D,” norm defined by || z |l, = || Dz'z jl. This is the 
objective of scaling. Note that it encompasses two issues: the condition 
of the scaled problem and the appropriateness of appraising error in the 
D4-norm. 

An interesting but very difficult mathematical problem concerns the 
exact minimization of &,(D, ! AD) for general diagonal D; and various 
p. What results there are in this direction are not very practical. This is 
hardly discouraging, however, when we recall that (3.5.3) is heuristic and 
it makes little sense to minimize exactly & heuristic bound. What we seek 
is a fast, approximate method for improving the quality of the computed 
solution 2. 

One technique of this variety is simple row scaling. In this scheme Do is 
the identity and D, is chosen so that each row in D; A has approximately 
the same oo-norm. Row scaling reduces the likelihood of adding a very 
small number to a very large number during elimination—an event that 
can greatly diminish accuracy. 

Slightly more complicated than simple row scaling is row-column egui- 
libration. Here, the object is to choose D, and D; so that the co-norm 
of each row and column of Dy *AD3 belongs to the interval [1/8, 1] where 
B is the base of the floating point system. For work along these lines see 
McKeeman (1962). 

It cannot be stressed too much that simple row scaling and row-column 
equilibration do not "solve" the scaling problem. Indeed, either technique 
can render a worse £ than if no scaling whatever is used. The ramifications 
of this point are thoroughly discussed in Forsythe and Moler (1967, chap- 
ter 11). The basic recommendation is that the scaling of equations and 
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unknowns must proceed on a problem-by-problem basis. General scaling 
strategies are unreliable. It is best to scale (if at ail) on the basis of what the 
source problem proclaims about the significance of each a,;. Measurement 
units and data error may have to be considered. 


Example 3.5.1 (Forsythe and Moler (1967, pp. 34, 40]) . If 
10 100,000 z1 u 100, 000 
1 I r2 a 2 
and the equivalent row-scaled problem 
00 1][z] _ [1 
1 1 r3 n 2 
are each solved using / = 10,t = 3 arithmetic, then solutions 2 = (0.00, 1.00)T and 


2 = (1.00, 1.00)" are respectively computed. Note that z = (1.0001..., .9999...)T is 
the exact solution. 


3.9.3 Iterative Improvement 


Suppose Az = b has been solved via the partial pivoting factorization PA = 
LU and that we wish to improve the accuracy of the computed solution z. 
If we execute 


r-b-AÀAz 
Solve Ly = Pr. (3.5.4) 
solve Uz = y. 


new ËH? 


then in exact arithmetic Az,.. = A$ + Az = (b—r)+r = b. Unfortunately, 
the naive floating point execution of these formulae renders an Znew that is 
no more accurate than z. This is to be expected since ? = fl(b — AZ) has 
few, if any, correct significant digits. (Recall Heuristic L) Consequently, 
z= fl(ACir) = A71 - noise = noise is a very poor correction from the 
standpoint of improving the accuracy of $. However, Skeel (1980) has done 
an error analysis that indicates when (3.5.4) gives an improved Tnew from 
the standpoint of backwards error. In particular, if the quantity 


r = (MALA Ilo) (max (Alle): /min (Al: ) 


is not too big, then (3.5.4) produces an new such that (A+ E)rs,, = b 
for very small E. Of course, if Gaussian elimination with partial pivoting 
is used then the computed € already solves a nearby system. However, 
this may not be the case for some of the pivot strategies that are used to 
preserve sparsity. In this situation, the fired precision iterative improvement 
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step (3.5.4) can be very worthwhile and cheap. See Arioli, Demmel, and 
Duff (1988). 

For (3.5.4) to produce a more accurate z, it is necessary to compute the 
residual b— Az with extended precision floating point arithmetic. Typically, 
this means that if t-digit arithmetic is used to compute PA = LU, z, y, and 
z, then 2t-digit arithmetic is used to form b — Az, i.e., double precision. The 
process can be iterated. In particular, once we have computed PA = LU 
and initialize z = 0, we repeat the following: 


r —b-— Az (Double Precision) 


Solve Ly — Pr for y. (3.5.5) 
Solve Uz — y for z. 
Ér-rclz 


We refer to this process as mired precision iterative improvement. The 
original A must be used in the double precision computation of r. The 
basic result concerning the performance of (3.5.5) is summarized in the 
foliowing heuristic: 


Heuristic III. If the machine precision u and condition satisfy u = 107% 
and K,.(A) = 107, then after k executions of (3.5.5), z has approxi- 
mately min(d, k(d — q)) correct digits. 


Roughly speaking, if ux,.(A) < 1, then iterative improvement can ulti- 
mately produce a solution that is correct to full (single) precision. Note 
that the process is relatively cheap. Each improvement costs O(n?), to be 
compared with the original O(n*) investment in the factorization PA = LU. 
Of course, no improvement may result if A is badly enough conditioned with 
respect to the machine precision. 

The primary drawback of mixed precision iterative improvement is that 
its implementation is somewhat machine-dependent. This discourages its 
use in software that is intended for wide distribution. The need for retaining 
an original copy of A is another aggravation associated with the method. 

On the other hand, mixed precision iterative improvement is usually 
very easy to implement on a given machine that has provision for the ac- 
cumulation of inner products, i.e., provision for the double precision calcu- 
lation of inner products between the rows of A and z. In a short mantissa 
computing environment the presence of an iterative improvement routine 
can significantly widen the class of solvable Ar = b problems. 


Example 3.5.2 [f (3.5.5) is applied to the system 

5386 579 z1 235 

409 .237 = 107 
and 3 = 10 and t = 3, then iterative improvement produces the following sequence of 
computed sohitions: 


oe 2.11 1.99 2.00 
-3.17 |’ | -2.99 ]' | -300 J'" 
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The exact solution is x = (2, —3]T. 


3.5.4 Condition Estimation 


Suppose that we have solved Az = b via PA = LU and that we now wish 
to ascertain the number of correct digits in the computed solution z. It 
follows from Heuristic TI that in order to do this we need an estimate of the 
condition Kea(A) = || A llæll A^! fos. Computing || A [lo poses no problem 
as we merely use the formula 


n 
ll A Hes foes 2. [a;;|. 
The challenge is with respect to the factor || A—* |lo. Conceivably, we 
could estimate this quantity by || X leo, where X = [21,...,%,] and zi 
is the computed solution to Ar; = e;. (See 83.4.9.) The trouble with this 
approach is its expense: Êw = || A lloll X lla; costs about three times as 
much as 2. 

The central probiem of condition estimation is how to estimate the 
condition number in O(n?) flops assuming the availability of PA = LU or 
some other factorizations that are presented in subsequent chapters. Án 
approach described in Forsythe and Moler (SLE, p. 51) is based on iterative 
improvement and the heuristic Usl A) = [| z ll/ll x |o» where z is the first 
correction of z in (3.5.5). While the resulting condition estimator is O(n?), 
it suffers from the shortcoming of iterative improvement, namely, machine 
dependency. 

Cline, Moler, Stewart, and Wilkinson (1979) have proposed a very suc- 
cessful approach to the condition estimation problem without this flaw. It 
is based on exploitation of the implication 


Ay=d = | A7! lee 2 Ily loo/ll di[oo. 


The idea behind their estimator is to choose d so that the solution y is large 
in norm and then set 


Roa = || A lloll y lleo/If d lloc. 


The success of this method hinges on how close the ratio || y |lea/|| d {loo is 
to its maximum value || A^! |l;,. 

Consider the case when A = T' is upper triangular. The relation between 
d and y is completely specified by the following column version of back 
substitution: 


p(i:n) = 0 
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for k =n: — 1:1 
Choose d(k). 
ylk) = (dk) — p(K))/T(k, k) (3.5.6) 
p(l:k — 1) = p(l:k — 1) + y(K)T(1:k — 1, K) 

end 


Normally, we use this algorithm to solve a given triangular system Ty = d. 
Now, however, we are free to pick the right-hand side d subject to the 
"constraint" that y is large relative to d. 

One way to encourage growth in y is to choose d(k) from the set 
f-1,+1} so as to maximize y(k). If p(k) > 0, then set d(k) = —1. If 
p(k) < 0, then set d(k) = +1 . In other words, (3.5.6) is invoked with d(k) 
= -sign(p(k)). Since d is then a vector of the form d(1:n) = (+1,...,+1)7, 
we obtain the estimator &y, = [| T llosl| v lleo- 

A more reliable estimator results if d(k) € {—1, +1} is chosen so as 
to encourage growth both in y(k) and the updated running sum given by 
p(1:k — 1, k) + T(1:k — 1, &)y(k). In particular, at step k we compute 


y(k)* = (1 — p(k))/T (kK, k) 
s(k)* = jy(&)*| + || p(L:k — 1) + T(1:k — 1, k)y(&)* |l 
y(k)" = (-1— p(k))/T(K, k) 


s(k)~ = |y(k)| + B p(:k — 1) + T(Ek — 1, K)y(k)" |f, 


and set 


y(k)* ifs(k)*t > s(k)" 
y(k) = 


y(k)  ifs(k)* < a(k)” 
This gives 


Algorithm 3.5.1 (Condition Estimator) Let T € R°“" be a nonsin- 
gular upper triangular matrix. This algorithm computes unit oo-norm y 
and a scalar « so || Ty ||os = 1/4 T^ ' ||oo and s = &o(T) 


p(1:i) 2 0 
for k = m -1:1 
y(k)* = (1 - p(k))/T (k, k) 
y(k)- = (-1— p(k))/T(k, k) 
p(k)* = p(l:k — 1) + T(1:k — 1, kjylk}t 
P(k) -—p(Lk-1)-c T(Lk — 1, k)y(k)" 
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i£ jy(&)*I + Ce)? ll, 2 tele) 1 + Weta)” Hs 


wk) = y(k)* 
p(1:k — 1) = p(k)* 
else 
y(k) = y(k)^ 
p(l:k — 1) = p(k)” 
end 
end 
& Ly lleol T fleo; 
y = y/l v læ 


The algorithm involves several times the work of ordinary back substitution. 

We are now in a position to describe a procedure for estimating the 
condition of a square nonsingular matrix A whose PA = LU factorization 
we know: 


e Apply the lower triangular version of Algorithm 3.5.1 to UT and ob- 
tain a large norm solution to UT y = d. 


e Solve the triangular systems L'r = y, Lw = Pr, and Uz = w 
* Roo = l| A lool z [leo /Il 7 lloc. 
Note that || z [loo < I A^! lloll 7 foo. The method is based on several heuris- 


tics. First, if A is ill-conditioned and PA = LU, then it is usually the case 
that U is correspondingly ill-conditioned. The lower triangle L tends to be 
fairly well-conditioned. Thus, it is more profitable to apply the condition 
estimator to U than to L. The vector r, because it solves AT PTr = d, 
tends to be rich in the direction of the left singular vector associated with 
Smin(A). Righthand sides with this property render large solutions to the 
problem Az — r. 

In practice, it is found that the condition estimation technique that we 
have outlined produces good order-of-magnitude estimates of the actual 
condition number. 


Problems 


P3.5.1 Show by example that there may be more than one way to equilibrate a matrix. 


P3.5.2 Using A = 10,t = 2 arithmetic, solve 


11 15 zi] _ [7 
5 7 Z4 ~ 3 
using Gaussian elimination with partial pivoting. Do one step of iterative improvement 
using t = 4 arithmetic to compute the residual. (Do not forget to round the computed 
residual to two digits ) 
P3.5.3 Suppose P(A+ E) x LU, where P isa permutation, L is lower triangular with 
[A < 1, and U is upper triangular. Show that &e (A) > || A lloo/{]] E loo + 4) where 
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p = min |ü]. Conclude that if a small pivot is encountered when Gaussian elimination 
with pivoting is applied to A, then A is il-conditioned. The converse is not true. (Let 
A = Bn). 

P3.5.4 {Kahan 1966) The system Az = b where 


2 -1 1 2(1 + 197?) 
A= | -1 107°? 10-19 b= -10-19 
1 107? 10-19 107 !9 


has solution z = (10719 — 1 1)T. (a) Show that if (A + E)y = b and |E] < 1075|AJ, 
then jz — y| < 107" |zj. That is, smal! relative changes in A's entries do not induce large 
changes in z even though xo; (A) = 1019. (b) Define D = diag(1075,105, 105). Show 
Ken (DAD) € 5. (c) Explain what is going on in terms of Theorem 2.7.3. 

P3.5.5 Consider the matrix: 


10 M -M 

0 1 -M M 
T-!nog 1 o | MER. 

0 0 0 1 


What estimate of &«, (T) is produced when (3.5.6) is applied with d(k) = —sign(p(k))? 
What estimate does Algorithm 3.5.1 produce? What is the true &x (T)? 


P3.5.6 What does Algorithm 3.5.1 produce when applied to the matrix B4 given in 
(2.7.9)? 
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Chapter 4 


Special Linear Systems 


84.1 The LDMT and LDL? Factorizations 
84.2 Positive Definite Systems 

§4.3 Banded Systems 

854.4 Symmetric Indefinite Systems 

64.5 Block Systems 

64.6 Vandermonde Systems and the FFT 
64.7 Toeplitz and Related Systems 


It is a basic tenet of numerical analysis that structure should be ex- 
ploited whenever solving a problem. In numerical linear algebra, this trans- 
lates into an expectation that algorithms for general matrix problems can 
be streamlined in the presence of such properties as symmetry, definiteness, 
and sparsity. This is the central theme of the current chapter, where our 
principal aim is to devise special algorithms for computing special variants 
of the LU factorization. 

We begin by pointing out the connection between the triangular fac- 
tors L and U when A is symmetric. This is achieved by examining the 
LDM! factorization in $4.1. We then turn our attention to the important 
case when A is both symmetric and positive definite, deriving the stable 
Cholesky factorization in $4.2. Unsymmetric positive definite systems are 
also investigated in this section. In §4.3, banded versions of Gaussian elimi- 
nation and other factorization methods are discussed. We then examine the 
interesting situation when A is symmetric but indefinite. Our treatment of 
this problem in §4.4 highlights the numerical analyst'a ambivalence towards 
pivoting. We love pivoting for the stability it induces but despise it for the 
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structure that it can destroy. Fortunately, there is a happy resolution to 
this conflict in the symmetric indefinite problem. 

Any block banded matrix is also banded and so the methods of $4.3 are 
applicable. Yet, there are occasions when it pays not to adopt this point of 
view. To illustrate this we consider the important case of block tridiagonal 
systems in $4.5. Other block systems are discussed as well. 

In the final two sections we examine some very interesting O(n?) algo- 
rithms that can be used to solve Vandermonde and Toeplitz systems. 


Before You Begin 


Chapter 1, §§2.1-2.5, and §2.7, and Chapter 3 are assumed. Within this 
chapter there are the following dependencies: 


84.5 
f 
$41 — $42 — 843 — 5844 
! 
$4.6 — $47 


Complementary references include George and Liu (1981), Gil, Murray, 
and Wright (1991), Higham (1996), Trefethen and Bau (1996), and Demmel 
(1996). Some MATLAB functions important to this chapter: chol, tril, 
triu, vander, toeplitz, fft. LAPACK connections include 


LAPACK: General Band Matrices 


Solve AX — B 

Condition estimator 

Improve AX = B, AT X = B, AH X = B solutions with error bounds 
Solve AX = B, AT X = B, AH X = B with condition estimate 

PA z LU 

Solve AX = B, AT X = B, AU X = B via PA = LU 


onda d amor 

Improve AX = B, AT X = B, AP X = B solutions with error bounds 
Solve AX = B, AT X = B, AH X = B with condition estimate 
PA= LU 

Solve AX = B, AT X = B, AU X = B via PA = LU 


E ais via PA = LU 
Improve AX = B solutions with error bounds 
Solve AX = B with condition estimate 

A = GGT 

Solve AX = B via A= GGT 

A- 

Equilibration 
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LAPACK: Banded Symmetric Positive Definite 
Solve AX = B 
Condition estimate via A = GGT 


Improve AX = B solutions with error bounds 
Solve AX = B with condition estimate 

A = GGT 

Solve AX = B via A = GGT 


PTSV Solve AX = E 
Condition estimate via A = LDLT 
Improve AX = B solutions with error bounds 
Solve AX = B with condition estimate 
A= LDLT 
Solve AX = B via A = LDLT 


Condition estimate via PAPT = LDLT 
Improve AX = B solutions with error bounds 
Solve AX = B with condition estimate 
PAPT = LDLT 

Solve AX = B via PAPT = LDLT 


AT! 


Condition estimate 
Improve AX = B, AT X = B solutions with error bounds 
Sole AX = B, ATX - B 


4.1 The LDM" and LDL! Factorizations 


We want to develop a structure-expioiting method for solving symmetric 
Az = b problems. To do this we establish a variant of the LU factorization 
in which A is factored into a three-matrix product LDMT where D is 
diagonal and L and M are unit lower triangular. Once this factorization is 
obtained, the solution to Ar = b may be found in O(n?) flops by solving 
Ly = b (forward elimination), Dz = y, and MTz = z (back substitution). 
The reason for developing the LDM? factorization is to set the stage for 
the symmetric case for if A = AT then L = M and the work associated 
with the factorization is half of that required by Gaussian elimination. The 
issue of pivoting is taken up in subsequent sections. 


4.1.1 The LDMT Factorization 
Our first result connects the LDMT factorization with the LU factorization. 
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Theorem 4.1.1 If all the leading principal submatrices of A c R°*" are 
nonsingular, then there exist unique unit lower triangular matrices L and M 
and a unique diagonal matriz D = diag(d),...,dn) such that A= LDMT. 


Proof. By Theorem 3.2.1 we know that A has an LU factorization A = LU. 
Set D = diag(d;,...,d4) with d; = t; for i = l:n. Notice that D is non- 
singular and that MT = D-!U is unit upper triangular. Thus, A = LU = 
LD(D-!U) = LDMT. Uniqueness follows from the uniqueness of the LU 
factorization as described in Theorem 3.2.1. 0 


The proof shows that the LDMT factorization can be found by using Gaus- 
sian elimination to compute A = LU and then determining D and M from 
the equation U = DMT. However, an interesting alternative algorithm can 
be derived by computing L, D, and M directly. 

Assume that we know the first j — 1 columns of E, diagonal entries 
d;,...,d4 1 of D, and the first j — 1 rows of M for some j with 1 <7 € n. 
To develop recipes for L(j + 1:n, 5), M(j, 1:j — 1), and d; we equate jth 
columns in the equation A = LDMT. In particular, 


A(L:n, j} = Lv (4.1.1) 


where v = DMTe;. The “top” half of (4.1.1) defines v(1:7) as the solution 
of a known lower triangular system: 


L(1:5, 1:3) v(1:3) = A(1:3,3) . 
Once we know v then we compute 
aj) = wj) 
M(j,i) v(i)/d(i) i-1:-1. 


The “bottom” half of (4.1.1) says L(j + Lin, 1:5)v(1:j) = A(j + 1:n, 7) which 
can be rearranged to obtain a recipe for the jth column of L: 


L(j--1:1j)v(j)) = A({} + Ln, 3) - LG + 1m,1:—1)(1:3- 1). 


Thus, L(j + l:n, j) is a scaled gaxpy operation and overall we obtain 


for 7 = Ln 
Solve L(1:5, 1:7)v(1:7) = A(1:7, 7) for v(1:7). 
for i = 1:j -1 
M (j,i) = v(i)/d(i) (4.1.2) 
end 


d(j) = vj) 
LG + lin, 7) = 
(AG + 1:0, j) — L(j + Lin, 1:j - 1)v(1:j — 1)) /v(j) 
end 
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As with the LU factorization, it is possible to overwrite A with the L, D, 
and M factors. If the column version of forward elimination is used to solve 
for v{1:7) then we obtain the following procedure: 


Algorithm 4.1.1 (LDM?) If A € R°*” has an LU factorization then 
this algorithm computes unit lower triangular matrices L and M and a 
diagonal matrix D = diag(d;,...,d,) such that A = LDMT. The entry 
a,; is overwritten with £j; ifi j , with d; if i = j, and with mj if i < j. 


for j = i:n 
{ Solve L(1:5, 1:5)v(1:3) = A(1:j, 7). ) 
v(1:3) = A(1:7, 3) 
for k= 1:3 — 1 
v(k + 1:5) = v(k + 1:3) — w(K)A(E + 1:7, k) 
end 
{ Compute M(j, 1:7 — 1) and store in A(1:j — 1,5). } 
for i=l: — 1 
A(i, j) = v(i)/A(i,i) 
end 


{ Store d(5) in A(j, j). } 
A(j, 3) = v(3) 
{ Compute L(j + 1:n, j) and store in A(j + 1:n, j) } 
for k= 1:j3-1 

A(j + 1:n) = A(j + Lin, j) — v(k) A(j + L:n, k) 
end 
A(j + Lin, j) = A + En, 2)/v(7) 


end 


This algorithm involves the same amount of work as the LU factorization, 
about 2n?/3 flops. . 

The computed solution z to Az = b obtained via Algorithm 4.1.1 and 
the usual triangular system solvers of §3.1 can be shown to satisfy a per- 
turbed system (A + E)2 = b, where 


JE} < nu (3/Al + SLD I) + Olu?) (4.1.3) 


and L, D, and M are the computed versions of L, D, and M, respectively. 

As in the case of the LU factorization considered in the previous chapter, 
the upper bound in (4.1.3) is without limit unless some form of pivoting is 
done. Hence, for Algorithm 4.1.1 to be a practical procedure, it must be 
modified so as to compute a factorization of the form PA — LDMT, where 
P is a permutation matrix chosen so that the entries in L satisfy |Z;;| < 1. 
The details of this are not pursued here since they are straightforward and 
since our main object for introducing the LDMT factorization is to motivate 
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special methods for symmetric systems. 


Example 4.1.1 


10 10 2320 1 0 D 10 OQ O0 1i 1 2 
Ax 20 325 40 | = 2 1 Q0 ao 5 9 0 1 0 
30 50 8l J 4 1 0 0 1 0 0 1 


and upon completion, Algorithm 4.1.1 overwrites A as follows: 


lO 1 2 
A= 2 5 Of . 
d 4 1 


4.1.2 Symmetry and the LDL! Factorization 
There is redundancy in the LDMT factorization if A is symmetric. 


Theorem 4.1.2 IF A = LDMT is the LDMT factorization of a nonsin- 
gular symmetric matriz A, then L = M. 


Proof. The matrix M^! AM? = M^!LD is both symmetric and lower 
triangular and therefore diagonal. Since D is nonsingular, this implies 
that M715 is also diagonal. But M-!L is unit lower triangular and so 
M-^!L-2I.O 


In view of this result, it is possible to halve the work in Algorithm 4.1.1 
when it is applied to a symmetric matrix. In the jth step we already know 
M(j,1:j — 1) since M = L and we presume knowledge of L's first 7 — 1 
columns. Recall that in the jth step of (4.1.2) the vector v(1:7) is defined 
by the first j components of DM? e;. Since M = L, this says that 


d(1)L(j, 1) 


v(1:3) = PM 
d(j z DLG, j = 1) 

d(j) 
Hence, the vector v(1:j — 1) can be obtained by a simple scaling of L's jth 
row. The formula v(j) = A(j,7) — L(j, 1:7 — 1)v(1:j — 1) can be derived 
from the jth equation in L(1:7, 1:7)v = Á(1:7, 7) rendering 


for j = 1:n 
for i= 1:j— 1 
v(i) = L(y, i)d(i) 
end 


v(j) = AG, 7) — LG, 1:7 — 1)u(1:5 — 1) 
d(j) = v) 
LG + ln, j} = 
(A(j + n, j) - LG + len, Lj — Dollz — 1))/o(j) 


end 
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With overwriting this becomes 


Algorithm 4.1.2 (LDLT) If A «€ R"*" is symmetric and bas an LU 
factorization then this algorithun computes a unit lower triangular matrix 
L and a diagonal matrix D = diag(di,...,d«) so A = LDLT. The entry 
di; is overwritten with 5; if i > j and with d; if i = j. 
for 7 = Ln 
{ Compute v(1:7). } 
for i=1:j-1 
vli) = Ai, i) AG, i) 
end 
v) = AG, j) - AGG, 1: — 1)v(1:j — 1) 
{ Store d(j) and compute L(j + 1:n, j). ] 
AG, 3) = u(3) 
A(j + Lin, j) = 
(A(j + Ln, j) — AG + lin, 1:4 - 1)» (1:5 - 1)/v() 
end 
This algorithm requires n?/3 flops, about half the number of flops involved 
in Gaussian elimination. 

In the next section, we show that if A is both symmetric and positive 
definite, then Algorithm 4.1.2 not only runs to completion, but is extremely 
stable. If A is symmetric but not positive definite, then pivoting may be 
necessary and the methods of 54.4 are relevant. 


Example 4.1.2 


10 2 3 10 0 l0 0 0 1 2 3 
A= 20 45 80[ = 2 1 Q 5 0 0 1 4 
30 80 i71 j|- 3 4 1 0 1 0 0 i 


0 
0 
and so if Algorithm 4.1.2 is applied, A is overwritten by 


10 20 30 
Å om 2 5 S80 |. 


$ 4 1 


Problems 


P4.1.1 Show that the LOM! factorization of a nonsingular A is unique if it exista. 


P4.1.2 Modify Algorithm 4.1.1 so that it computes a factorization of the form PA = 
LDM7, where L and M are both unit lower triangular, D is diagonal, and P is a 
permutation that is chosen so |f;;| € 1. 

P4.1.3 Suppose the n-by-n symmetric matrix A = (aij) is stored in a vector c as 
follows: c = (a11,091,..-,001,072,.--,2nz,---,@an). Rewrite Algorithm 4.1.2 with A 
stored in this fashion, Get as much indexing outside the inner loops az possible. 
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P4.1.4 Rewrite Algorithm 4.1.2 for A stored by diagonal. See $1.2.8. 


Notes and References for Sec. 4.1 


Algorithm 4.1.1 is related to the methods of Crout and Doolittle in that outer product 
updates are avoided. See Chapter 4 of Fax (1964) or Stewart (1973,131-149). An Algol 
procedure may be found in 


H.J. Bowdler, R-S. Martin, G. Peters, and J.H. Wilkinson (1966), “Solution of Real and 
Complex Systems of Linear Equations," Numer. Math. 8, 217-234. 


See also 


G.E. Forsythe (1960). *Crout with Pivoting,” Comm. ACM 3, 507-08. 
W.M. McKeeman (1962). “Crout with Equilibration and Iteration,” Comm. ACM 5, 
553-55. 


Just as algorithms can be tailored to exploit structure, so can error analysis and pertur- 
bation theory: 


M. Arioli, J. Demmel, and I. Duff (1989). "Solving Sparse Linear Systems with Sparse 
Backward Error," SIAM J. Matriz Anal. Appl i0, 165-190. 

J.R. Bunch, J.W. Demmel, and C.F. Van Loan (1989). “The Strong Stability of Algo- 
rithms for Solving Symmetric Linear Systems,” SIAM J. Matriz Anal, Appi. 10, 
494—499. 

A. Barrlund (1991). “Perturbation Bounds for the LDL? and LU Decompositions,” 
BIT 31, 358-363. 

D.J. Higham and N.J. Higham (1992). “Backward Error and Condition of Structured 
Linear Systems,” SIAM J. Matrix Anal Appl. 13, 162-175. 


4.2 Positive Definite Systems 
A matrix A c K^*" is positive definite if rT Ar > 0 for all nonzero z € IR". 


Positive definite systems constitute one of the most important classes of 
special Az = b problems. Consider the 2-by-2 symmetric case. If 


du il 412 
G21 322 


is positive definite then 


z = (107 > zTAz = ay>0 

r = (0,1)7 > zlAz = an>O 

z = (1, 1)? => rAr = G11 + 28012 + âz >Ô 
r = (1-17 = zTAxz = ayy -2a +ay >ð. 


The last two equations imply |a15| < (a1; + 822)/2. From these results we 
see that the largest entry in A is on the diagonal and that it is positive. This 
turns out to be true in general. A symmetric positive definite matrix has 
a “weighty” diagonal. The mass on the diagonal is not blatantly obvious 
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as in the case of diagonal dominance but it has the same effect in that it 
precludes the need for pivoting. See §3.4.10. 

We begin with a few comments about the property of positive definite- 
ness and what it implies in the unsymmetric case with reapect to pivoting. 
We then focus on the efficient organization of the Cholesky procedure which 
can be used to safely factor a symmetric positive definite A. Gaxpy, outer 
product, and block versions are developed. The section concludes with a 
few comments about the semidefinite case. 


4.2.1 Positive Definiteness 


Suppose A € IR?*" is positive definite. It is obvious that a positive definite 
matrix is nonsingular for otherwise we could find a nonzero T so T7 Az = 0. 
However, much more is implied by the positivity of the quadratic form 
z’ Az as the following results show. 


Theorem 4.2.1 If A c IR?*" is positive definite and X € R"** has rank 
k, then B = XT AX e R*** is also positive definite. 


Proof. If z € RÝ satisfies 0 > z7 Bz = (Xz)' A(Xz) then Xz = 0. But 
since X has full column rank, this implies that z = 0. O 


Corollary 4.2.2 If A is positive definite then all its principal submatrices 
are positive definite. In particular, all the diagonal entries are positive. 


Proof. If v € IR is an integer vector with 1 € vj < --- < vg € n, then 
X = Il, (:, v) is a rank k matrix made up columns ,,..., Yx of the identity. 
It follows from Theorem 4.2.1 that A(v,v) = XT AX is positive definite. O 


Corollary 4.2.3 If A is positive definite then the factorization A = LDM? 
extsts and D = diag(d;,...,d,) has positive diagonal entries. 


Proof. From Corollary 4.2.2, it follows that the submatrices A(1:&, 1:K) 
are nonsingular for k = 1:n and so from Theorem 4.1.1 the factorization 
A= LDMT exists. If we apply Theorem 4.2.1 with X = L-T then B = 
DMTL-T = L-'!AL-T is positive definite. Since MT L^T is unit upper 
triangular, B and D have the same diagonal and it must be positive. 0 
There are several typical situations that give rise to positive definite ma- 
trices in practice: 

e The quadratic form is an energy function whose positivity is guaran- 

teed from physical principles. 


e The matrix A equals a cross-product XT X where X has full column 
rank. (Positive definiteness follows by setting A = J, in Theorem 
4.2.1.) 


e Both A and A’ are diagonally dominant and each ay is positive. 
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4.2.2 Unsymmetric Positive Definite Systems 


The mere existence of an LDMT factorization does not mean that its com- 
putation is advisable because the resulting factors may have unacceptably 
large elements. For example, if « > 0 then the matrix 


AT [m dn Lom TIL ermal lo 1 ] 


is positive definite. But if m/e >> 1, then pivoting is recommended. 
The following result suggests when to expect element growth in the 
LDM! factorization of a positive definite matrix. 


Theorem 4.2.4 Let A € R°™*" be positive definite and set T = (A-- AT)/2 
and S = (A — AT)/2. If A= LDMT, then 


I IZIDIMT He S a T lle +l STS fla) (4.2.1) 


Proof. See Golub and Van Loan (1979). L1 


The theorem suggests when it is safe not to pivot. Assume that the com- 
puted factors L, D, and M satisfy: 


HEWDIMT ie < ell [EDM le, (4.2.2) 
where c is a constant of modest size. It follows from (4.2.1) and the analysis 
in $3.3 that if these factors are used to compute a solution to Az = b, then 
the computed solution f satisfies (A + £) = b with 

| Elle € u(3nl A [lp + 5en* (IIT lla +i STES |2)) + O(u?). (4.2.3) 


It is easy to show that || T |[2 < || A i[2, and so it follows that if 


. AST! S lb 
Q = Coah (4.2.4) 


is not too large then it is safe not to pivot. In other words, the norm of the 
skew part S has to be modest relative to the condition of the symmetric 
part T. Sometimes it is possible to estimate (1 in an application. This is 
trivially the case when A is symmetric for then Q = 0. 


4.2.3 Symmetric Positive Definite Systems 


When we apply the above results to a symmetric positive definite system 
we know that the factorization A = LDL exists and moreover is stable to 
compute. However, in this situation another factorization is available. 
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Theorem 4.2.5 (Cholesky Factorization ) Jf A € IR"*" is symmetric 
positive definite, then there exists a unique lower triangular G € IR'*" with 
positive diagonal entries such that A= GGT. 


Proof. From Theorem 4.1.2, there exists a unit lower triangular L and a 
diagonal D = diag(d,,...,d,) such that A = LDL". Since the dy are pos- 
itive, the matrix G = L diag( //d;,..., Vda) i3 real lower triangular with 
positive diagonal entries. It also satisfies A = GGT. Uniqueness follows 
from the uniqueness of the LDLT factorization. O 


The factorization A = GGT is known as the Cholesky factorization and G 
. is referred to as the Cholesky triangle. Note that if we compute the Cholesky 
factorization and solve the triangular systems Gy = b and GT z = y, then 
b-Gy-G(GTz)-(GGT)z = Az. 

Our proof of the Cholesky factorization in Theorem 4.2.5 is constructive. 
However, more effective methods for computing the Cholesky triangle can 
be derived by manipulating the equation A = GGT. This can be done in 
several ways as we show in the next few subsections. 


Example 4.2.1 The matrix 
[4 32]-L3 THE SIGS 3] ER a] 4 74] 


is positive definite. 


4.2.4  Gaxpy Cholesky 


We first derive an implementation of Cholesky that is rich in the gaxpy 
operation. If we compare jth columns in the equation A = GGT then we 
obtain 


J 
| A(43) = 3 GG, k)GC k). 


km] 
This says that 
j-1 
GG,j)G(,j) = A(,3) - 5 GG K)GCk) = v. (4.2.5) 
km] 


If we know the first 7 — 1 columna of G, then v is computable. It follows 
by equating components in (4.2.5) that 


G:n, j) = v(3:n)/ y vy). 


This is à scaled gaxpy operation and so we obtain the following gaxpy-based 
method for computing the Cholesky factorization: 
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for 7 = 1:n 
v(j:n) = A(j:n, 7) 
for k=1:7-1 
v(j:n) = v(j:n) — G(j, kK)G(j:n, k) 
end 
Glin, j) = vG:n)/ V20) 


end 


It is possible to arrange the computations so that G overwrites the lower 
triangle of A. 


Algorithm 4.2.1 (Cholesky: Gaxpy Version) Given a symmetric 
positive definite A € IR"*", the following algorithm computes a lower tri- 
angular G € IR?*" such that A = GGT. For all į > j, G(i, j) overwrites 
A(i, 7). 
for j = L:n 
ifj>1 
A(j:n,j) = Ain, j) - A(j:in, 1:3 - DAC, 1:5 ~ 1)7 
e 


A(jin, 3) = AG 3)/ V AG, j) 


end 


This algorithm requires n*/3 flops. 


4.2.5 Outer Product Cholesky 


An alternative Cholesky procedure based on outer product (rank-1) updates 
can be derived from the partitioning 


4-5 8] n [uo nallo smell ha 
|e B| | off hi 0 B—wT/a 0 ha 

(4.2.6) 
Here, 8 = ya and we know that a > 0 because A is positive definite. Note 
that B — vv /a is positive definite because it is a principal submatrix of 
XT AX where 

x = | 1 —w/o | 
0 fai : 


If we have the Cholesky factorization GGT = B -w /a, then from (4.2.6) 
it follows that A = GGT with 


8 - | fs HI 


Thus, the Cho.esky factorization can be obtained through the repeated 
application of (4.2.6), much in the the style of kji Gaussian elimination. 
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Algorithm 4.2.2 (Cholesky: Outer product Version) Given a sym- 
metric positive definite A c R'*", the following algorithm computes a lower 
triangular G € IR"*" such that A = GGT., For all i > j, G(i, j) overwrites 
A(i, j). 


for k = ln 
A(k, k) = / A(k, k) 
A(k + l:n, k) = A(k + len, k)/ A(, k) 
for j7=k + 1:n 
A(j:n, j) = Agen, j) — Ain, k) AG, k) 
end 
end 


This algorithm involves n?/3 flops. Note that the j-loop computes the lower 
triangular part of the outer product update 


Afk + n,k + En) = Afk + link+ En) — Alk + ln KYA(k + Yn, k). 


Recalling our discussion in 81.4.8 about gaxpy versus outer product up- 
dates, it is easy to show that Algorithm 4.2.1 involves fewer vector touches 
than Algorithm 4.2.2 by a factor of two. 


4.2.06 Block Dot Product Cholesky 


Suppose A € IR"™™ is symmetric positive definite. Regard A = (4;;) and its 
Cholesky factor G = (Gi;) as N-by-N block matrices with square diagonal 
blocks. By equating (i, j) blocks in the equation A = GGT with i > j it 
follows that 


i 
Aij = 5 GuGh. 


k=l 
Defining 


j-i 
S = Ai; == $ GaG;, 
kl 


we see that G;,GT, = S if i = j and that G,GT, = S i£ i > j. Properly 


sequenced, these equations can be arranged to compute all the Gj;: 


Algorithm 4.2.3 (Cholesky: Block Dot Product Version) Given a 
symmetric positive definite A € IR"*”, the following algorithm computes a 
lower triangular G € K?*" such that A = GGT. The lower triangular part 
of A is overwritten by the lower triangular part of G. A is regarded as an 
N-by-N block matrix with square diagonal blocks. 
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for j = L:N 
for i= 7:N 


j-1 
S = Ay — Y GaG 
kxl 
ifi=j 
Compute Cholesky factorization 5 = G Pele 
else 
Soive GijGT, = § for Gis 
end 
Overwrite Ais with Gi. 
end 
end 


The overall process involves n?/3 flops like the other Cholesky procedures 
that we have developed. The procedure is rich in matrix multiplication 
assuming a suitable blocking of the matrix A. For example, if n = rN and 
each Ai; is r-by-r, then the level-3 fraction is appraximately 1 — (1/N?). 

Algorithm 4.2.3 is incomplete in the sense that we have not specified how 
the products G,,G;, are formed or how the r-by-r Cholesky factorizations 
S = G,,G1, are computed. These important details would have to be 
worked out carefully in order to extract high performance. 

Another block procedure can be derived from the gaxpy Cholesky algo- 
rithm. After r steps of Algorithm 4.2.1 we know the matrices Gy, € KC ^" 
and Go € R -r in 


bs s] - Gu X Lar db H 

An An | | Gar la 0 A Gu In-r | ` 

We then perform r more steps of gaxpy Cholesky not on A but on the 
reduced matrix A = Az — GnG, which we explicitly form exploiting 
symmetry. Continuing in this way we obtain a block Cholesky algorithm 
whose kth step involves r gaxpy Cholesky steps on a matrix of order n — 


(k — 1)r followed a level-3 computation having order n — kr. The level-3 
fraction is approximately equal to 1 — 3/(2N) ifn zr NN. 


4.2.7 Stability of the Cholesky Process 


In exact arithmetic, we know that a symmetric positive definite matrix 
has & Cholesky factorization. Conversely, if the Cholesky process runs to 
completion with strictly positive square roots, then A is positive definite. 
Thus, to find out if a matrix A is positive definite, we merely try to compute 
its Cholesky factorization using any of the methods given above. 

The situation in the context of roundoff error is more interesting. The 
numerical stability of the Cholesky algorithm roughly follows from the in- 
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equality 
Jij sz Y Ih = Qij. 
kml 


This shows that the entries in the Cholesky triangle are nicely bounded. 
The same conclusion can be reached from the equation || G|} = || A ll2. 

The roundoff errors associated with the Cholesky factorization have 
been extensively studied in a classical paper by Wilkinson (1968). Using 
the results in this paper, it can be shown that if ¢ is the computed solution 
to Ax = b, obtained via any of our Cholesky procedures then £ solves 
the perturbed system (A + E)£ = b where | E||g < cnull Allg and cn 
is a small constant depending upon n. Moreover, Wilkinson shows that if 
qn uX2(À) < 1 where gnais another small constant, then the Cholesky process 
runs to completion, i.e, no square roots of negative numbers arise. 


Example 4.2.2 If Algorithm 4.2.2 is applied to the positive definite matrix 
i00 15 Ol 
A= 15 23 Ol 
01 .01 1.00 


and § = 10, t = 2, rounded arithmetic used, then 911 = 10, O21 = 1.5, g31 = .001 and 
$22 = 0.00. The algorithm then breaks down trying to compute gaz. 


4.2.8 The Semidefinite Case 


A matrix is said to be positive semidefinite if x7 Ax > 0 for all vectors 
rz. Symmetric positive semidefinite (sps) matrices are important and we 
briefly discuss some Cholesky-like manipulations that can be used to solve 
various sps problems. Results about the diagonal entries in an sps matrix 
are needed first. 


Theorem 4.2.6 If Ac R°™" is symmetric positive semidefinite, then 


l| S (ay t25)/2 (4.2.7) 

le] S «3a (G2 (4.2.8) 

max |ej| = max fi; (4.2.9) 
$y] L| 

G;-0 = A(i:)-20, A(:,i) =0 (4.2.10) 


Proof. If z = e; +e; then 0 € za Az = ay + ajj + 2a;; while z = e; — e; 
implies 0 € zT Ar = a4; + 6&5; —2a,;. Inequality (4.2.7) follows from these 
two results. Equation (4.2.9) is an easy consequence of (4.2.7). 

To prove (4.2.8) assume without loss of generality that i = 1 and 7 = 2 
and consider the inequality 


T 
0 < H s d E = a31Z? + 2ayor + 822 
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which holds since A(1:2, 1:2) is also semidefinite. This is a quadratic equa- 
tion in z and for the inequality to hold, the discriminant 4a?, — 4a11a22 
must be negative. Implication (4.2.10) follows from (4.2.8). O 


Consider what happens when outer product Cholesky is applied to an sps 
matrix. If a zero A(k, k) is encountered then from (4.2.10) A(k:n, k) is zero 
and there is “nothing to do” and we obtain 
for k = i:n 
if A(k, k) > 0 
A(k,k) = y Atk, k) 
A(K + Ln, k) = A(k + 1:n, K)/ A(k, k) 
for ; = k + lin 
A(j:n, 7) = AG:n, j) - A(j:n, kK) AC, K) (4.2.11) 
end 
end 
end 


Thus, a simple change makes Algorithm 4.2.2 applicable to the semidefinite 
case. However, in practice rounding errors preclude the generation of exact 
zeros and it may be preferable to incorporate pivoting. 


4.2.9 Symmetric Pivoting 


To preserve symmetry iu a symmetric A we only consider data reorderings 
of the form PAPT where P is a permutation. Row permutations (A — PA) 
or column permutations {A +— AP) alone destroy symmetry. An update of 
the form 

Aw PAPT 


is called a symmetric permutation of A. Note that such an operation does 
not move off-diagonal elements to the diagonal. The diagonal of PAP” is 
a reordering of the diagonal of A. 

Suppose at the beginning of the kth step in (4.2.11) we symmetrically 
permute the largest diagonal entry of A(k:n, k:n) into the lead position. 
If that largest diagonal entry is zero then A(k:n,k:n} = 0 by virtue of 
(4.2.10). In this way we can compute the factorization PAPT = GGT 
where G € R"™(*-1) jg lower triangular. 


Algorithm 4.2.4 Suppose A € R°*" is symmetric positive semidefinite 
and that rank(A) =r. The following algorithm computes a permutation P, 
the index r, and an n-by-r lower triangular matrix G such that PAPT = 
GGT. The lower triangular part of A(;,l:r) is overwritten by the lower 
triangular part of G. P = P,---P, where Ph is the identity with rows k 
and piv(k) interchanged. 
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r=0 
for k = Ln | 
Find q (k € q € n) so A(q, q) = max (A(k, k), ., A(n.n)) 
if A(q,q) > 0 
r=r+]1 
piv(k) = q 
A(k, :} = A(g,:) 
A(:, k) = A(:, q) 
A(k, k) = y A(k, k) 
A(k + lin, k) = A(k + 1:n, K)/A(k, k) 
for j=k+1:n 
A(:n, 3) = AG:n,j) - AG:n, E) AG, k) 
end 
end 
end 


In practice, à tolerance is used to detect small A(k, k). However, the sit- 
uation is quite tricky and the reader should consult Higham (1989). In 
addition, $5.5 has a discussion of tolerances in the rank detection problem. 
Finally, we remark that a truly efficient implementation of Algorithm 4.2.4 
would only access the lower triangular portion of A. 


4.2.10 The Polar Decomposition and Square Root 
Let A = U,34VT be the thin SVD of A € R™*" where m > n. Note that 
A = (UVTy(VX,V7T) = ZP (4.2.12) 


where Z = U,VT and P = V&V". Z has orthonormal columns and P is 
symmetric positive semidefinite because 


2? Pz -(VTz)UY(VTz)- S ai > 0 
k=] 

where y = VT z. The decomposition (4.2.12) is called the polar decom- 
position because it is analogous to the complex number factorization z — 
e*2r9(2|z|. See $12.4.1 for further discussion. 

Another important decomposition is the matrix square root. Suppose 
A c R®*” is symmetric positive semidefinite and that A = GGT is its 
Cholesky factorization. If G = UEVT is G's SVD and X = UEUT, then 
X is symmetric positive semidefinite and 

A = GGT = (U£ZVTyUZVT)? = UZ*UT = (U£UTy(UXUT) = x*. 


Thus, X is a square root of A. It can be shown (most easily with eigen- 
value theory) that a symmetric positive semidefinite matrix has a unique 
symmetric positive semidefinite square root. 
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Problems 


P4.2.1 Suppose that H = A +iB is Hermitian and positive definite with A, B c RH *^. 
This means that x“! Hz > 0 whenever z ££ 0. (a) Show that 


A -B 
dat mel 


is symmetric and positive definite. (b) Formulate an algorithm for solving (.À--iB)(z-riy) 
= (b + ic), where b, c, x, and y ere in R”. it should involve 6n7/3 flops. How much 
storage is required? 

P4.2.2 Suppose A € H'**" is symmetric and positive definite. Give an algorithm for 
computing an upper trianguler matrix R € K**^ such that A = ART. 

P4.2.3 Let A c RO*" be positive definite and set T = (A+ A™)/2 and 5 = (A—- AT)/2. 
(a) Show that || A^! ||a € | T^! |a and x7 A7!x < xTT-!z for all z € R^. (b) Show 
that if A= LDMT, then d, > 1/]| T^! |a for k = i:n 

P4.2.4 Find a 2-by-2 real matrix A with the property that rT Az > 0 for all real nonzero 
2-vectors but which is not positive definite when regarded as a member of p 


P4.2.5 Suppose A € R'**" has a positive diagonal. Show that if both A and AT are 
strictly diagonally dominant, then A ia positive definite. 

P4.2.6 Show that the function f(z) = (zT Az)/3 is a vector norm on R* if and only if 
A is positive definite. 

P4.2.7 Modify Algorithm 4.2.1 so that if the square root of a negative number is 
encountered, then the algorithm finds a unit vector z so zT Ar < 0 and terminates. 


P4.2.8 The numerical range W(A) of a complex matrix A is defined to be the set 
W(A) = {zf Az : z” z = 1). Show that if 0 ¢ W(A), then A has an LU factorization. 


P4.2.9 Formulate an m < n version of the polar decomposition for A € R™**, 


P4.2.10 Suppose A = I uu? where A € RO“ and || u [[3 = 1. Give explicit formulae 
for the diagonal and subdiagonal of A's Cholesky factor. 


P4.2.11 Suppose A c R^*^ is symmetric positive definite and that its Cholesky factor 
is available, Let ey = fA(:, k). For 1 <i <j <a, let ai be the smallest real thet makes 
A-a(e;eT +ese7) singular. Likewise, let oj, be the smallest real that makes (A+caxe;e7) 
singular. Show how to compute these quantities using the Sherman-Morrison- Woodbury 
formula. How many flops are required to find all the asz? 


Notes and References for Sec. 4.2 


The definiteneea of the quadratic form zT Az can frequently be established by considering 
the mathematics of the underlying problem. For example, the discretization of certain 
partial differential operators gives rise to provably positive definite matrices. Aspects of 
the unsymmetric positive definite problem are discussed in 


A. Buckley (1974). “A Note on Matrices A = I + H, H Skew-Symmetric," Z. Angew. 
Math. Mech, 54, 125-26. 


4.2. POSITIVE DEFINITE SYSTEMS 151 


A. Buckley (1977). “On the Solution of Certain Skew-Symmetric Linear Systems,” SIAM 
J. Num. Anal 14, 566-70. 

G.H. Golub and C. Van Loan (1979). "Unsymmetric Positive Definite Linear Systema," 
Lin. Aig. and Its Applic. 28, 85-98. 

R. Mathias (1992). "Matrices with Positive Definite Hermitian Part: Inequalities and 
Linear Systema," SIAM J. Matrix Anal Appl. 13, 640-654, 


Symmetric positive definite systema constitute the most important class of special Az = b 
problems. Algol programs for these problems are given in 


R.S. Martin, G. Peters, and J.H. Wilkinson (1965). “Symmetric Decomposition of a 
Positive Definite Matrix,” Numer. Math. 7, 362-83. 

R.S. Martin, G. Peters, and J.H. Wilkinson (1966). “Iterative Refinement of the Solution 
of a Positive Definite System of Equations,” Numer. Math. 8, 203-16. 

F.L. Bauer and C. Reinsch (1971). "Inversion of Positive Definite Matrices by the Gauss- 
Jordan Method,” in Handbook for Automatic Computation Vol. 2, Linear Algebra, 
J.H. Wilkinson and C. Reinsch, eds. Springer-Verlag, New York, 45-49. 


The roundoff errors associated with the method are analyzed in 


J.H. Wilkinson (1968). “À Priori Error Analysis of Algebraic Processes," Proc. Inter- 
national Congress Math. (Moecow: Izdat. Mir, 1968), pp. 629-39, 

J. Meinguet (1983), “Refined Error Analyses of Choleaky Factorization,” SIAM J. Nu- 
mer. Anal $0, 1243-1250. 

A. Kielbasinski (1987). “A Note on Rounding Error Analysis of Cholesky Factorization,” 
Lin. Alg. and Its Applic. 88/89, 481—494. 

N.J. Higham (1990). “Analysis of the Choleaky Decomposition of a Semidefinite Matrix," 
in ReliaMe Numerical Computation, M.G. Cox and S.J. Hammarling (eda), Oxford 
University Press, Oxford, UK, 151-185. 

R. Carter (1991). *Y-MP Floating Point and Cholesky Factorization,” Int'l J. High 
Speed Computing 3, 215-222. 

J-Guang Sun (1992). “Rounding Error and Perturbation Bounds for the Cholesky and 
LDL? Factorizations,” Lin. Alg. and Its Applic. 173, 77-97. 


The question of how the Choleaky triangle G changes when A = GGT is perturbed is 
analyzed in 


G.W. Stewart {1977b}. “Perturbation Bounds for the QR Factorization of a Matrix,” 
SIAM J. Num. Anal. 14, 509-18. 

Z. Dramiic, M. Omladič, and K. Veselit (1994). “On the Perturbation of the Cholesky 
Factorization,” SIAM J. Matriz Anal. Appl 15,1319-1332. 


Nearness / sensitivity issues associated with positive semi-definiteness and the polar de- 
composition are presented in 


N.J. Higham (1988). “Computing a Nearest Symmetric Positive Semidefinite Matrix,” 
Lin. Alg. and its Applic. 103, 103-118, 

R. Mathias (1993). “Perturbation Bounds for the Polar Decomposition,” SIAM J. Matrix 
AnaL Appi 14, 588-507. 

R-C. Li (1995). "New Perturbation Bounda for the Unitary Polar Factor,” SIAM J. 
Matriz Anal Appl 16, 327-332. 


Computationally-oriented references for the polar decomposition and the square root are 
given in 88,6 and 511.2 respectively. 
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4.3 Banded Systems 


In many applications that involve linear systems, the matrix of coefficients 
is banded. This is the case whenever the equations can be ordered so that 
each unknown z; appears in only a few equations in a "neighborhood" of 
the ith equation. Formally, we say that A = (a;;) hes upper bandwidth q 
if aj; = 0 whenever j > i +q and lower bandwidth p if aj; = 0 whenever 
i» 3+ p. Substantial economies can be realized when solving banded 
systems because the triangular factors in LU, GGT, LDMT , etc., are also 
banded. 

Before proceeding the reader is advised to review 81.2 where several 
aspects of band matrix manipulation are discussed. 


4.3.1 Band LU Factorization 


Our first result shows that if A is banded and A = LU then L(U) inherits 
the lower (upper) bandwidth of A. 


Theorem 4.3.1 Suppose A € R"*" has an LU factorization A = LU. If A 
has upper bandwidth q and lower bandwidth p, then U has upper bandundth 
q and L has lower bandwidth p. 


Proof. The proof is by induction on n. From (3.2.6) we have the factor- 
ization 


4-0 vt] | I 0 1 0 a w 
1v B| | ufa Ina 0 B- vw? fo 0 ha l` 
It is clear that B — vw? /a has upper bandwidth q and lower bandwidth p 
because only the first q components of w and the first p components of v 


are nonzero. Let LiU, be the LU factorization of this matrix. Using the 
induction hypothesis and the sparsity of w and v, it follows that 


have the desired bandwidth properties and satisfy A = LU. O 


The specialization of Gaussian elimination to banded matrices having an 
LU factorization is straightforward. 


Algorithm 4.3.1 (Band Gaussian Elimination: Outer Product Ver- 
sion) Given A € iR"“* with upper bandwidth g and lower bandwidth p, 
the following algorithm computes the factorization A = LU, assuming it 
exists. A(i, 7) is overwritten by L(t, f) i£ i > j and by U(i, j) otherwise. 
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for k=1:n-1 
for i = k + l:min(k + p, n) 
A(i, k) = Afi, k)/A(k, k) 
end 
for j = k + l:min(k + q, n) 
for i = k + l:min(k + p, n) 
Ali, j) = AG, j) — AG, k) A(k, j) 
end 
end 
end 


ifn `> p and n >> q then this algorithm involves about 2npq flops. Band 
versions of Algorithm 4.1.1 (LDMT) and all the Cholesky procedures also 
exist, but we leave their formulation to the exercises. 


4.3.23 Band Triangular System Solving 


Analogous savings can also be made when solving banded triangular sys- 
tems. 


Algorithm 4.3.2 (Band Forward Substitution: Column Version) 
Let L c R"“™ be a unit lower triangular matrix having lower bandwidth 
p. Given b € IR”, the following algorithm overwrites 5 with the solution to 
Lr = b. 


for j = 1:n 
for i = j + l:min(j + p,n) 
b(t) = b(t) — L(1, 5)5(3) 
end 
end 


If n `> p then this algorithm requires about 2np flops. 


Algorithm 4.3.3 (Band Back-Substitution: Column Version) Let 
U e R**" be a nonsingular upper triangular matrix having upper band- 
width g. Given b € R”, the following algorithm overwrites b with the solu- 
tion to Ur = b. 
for j = n: — 1:1 
b(j) = 5()/U 0.3) 
for i = max(1,j — q):7 - 
b(i) = b(i) — U (i, 7) (9) 
end 
end 


If n 3» q then this algorithm requires about 2ng flops. 
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4.3.8 Band Gaussian Elimination with Pivoting 


Gaussian elimination with partial pivoting can also be specialized to exploit 
band structure in A. If, however, PA = LU, then the band properties of L 
and U are not quite so simple. For example, if A is tridiagonal and the first 
two rows are interchanged at the very first step of the algorithm, then 13 
ia nonzero. Consequently, row interchanges expand bandwidth. Precisely 
how the band enlarges is the subject of the following theorem. 


Theorem 4.3.2 Suppose A c R°”” is nonsingular and has upper and lower 
bandundths q and p, respectively. If Gaussian elimination with partial pit- 
oling is used fo compute Gauss trunsformations 


M; = I ~ae? j5Ln-1 


and permutations P,,...,P,-1 such that Mn-1Phn-1' MPA = U is up- 
per triangular, then U has upper banduidth p + q and a) = 0 whenever 
tiyort>y+p. 


Proof. Let PA = LU be the factorization computed by Gaussian elimi- 
nation with partial pivoting and recall that P = Pa-1--- Py. Write PT = 
[es,.---1€s, ], Where {5),...,8,} is a permutation of (1,2,..., n). Ifs; > t+p 
then it follows that the leading i-by-1 principal submatrix of PA is singular, 
since (PA); = a,,; for j = 1:83; --p—1 and s; -p—1>1. This implies 
that U and A are singular, a contradiction. Thus, s; <i+p for i = 1:n and 
therefore, PA has upper bandwidth p+ g. It follows from Theorem 4.3.1 
that U has upper bandwidth p + 4. 

The assertion about the a) can be verified by observing that M; need 
only zero elements (j + 1,j),..., (3 + p. į) of the partially reduced matrix 
PjM;-,Pj-1 -- AAD 


Thus, pivoting destroys band structure in the sense that U becomes 
“wider” than A's upper triangle, while nothing at all can be said about 
the bandwidth of L. However, since the jth column of L is a permutation 
of the jth Gauss vector a;, it follows that L has at most p+ 1 nonzero 
elements per column. 


4.3.4 Hessenberg LU 


Ás an example of an unsymmetric band matrix computation, we show how 
Gaussian elimination with partial pivoting can be applied to factor an upper 
Hessenberg matrix H. (Recall that if H is upper Hessenberg then Ai; = 0, 
127 j-F1). After k — 1 stepa of Gaussian elimination with partial pivoting 
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we are left with an upper Hessenberg matrix of the form: 


k=3,n=5 


ooo oc x 
eae oo x x 
OX X X X 
X X X K XK 
X X X X X 


By virtue of the special structure of this matrix, we see that the next 
permutation, P5, is either the identity or the identity with rows 3 and 4 
interchanged. Moreover, the next Gauss transformation M; has a single 
nonzero multiplier in the (k + 1,4) position. This illustrates the Ath step 
of the following algorithm. 


Algorithm 4.3.4 (Hessenberg LU) Given an upper Hessenberg matrix 
H e IP*", the following algorithm computes the upper triangular matrix 
M,-1Fn-1°:: M, RH = U where each P, is a permutation and each Mk 
is a Gauss transformation whose entries are bounded by unity. H(i, k) is 
overwritten with U(i,k) if i < k and by (Myly4i,& ifi = k-- 1. An integer 
vector piv(1:n ~ 1) encodes the permutations. If Py = I, then piv(k) = 0. 
If P, interchanges rows k and k + 1, then piv(k) = I. 


for k — 1:n-1 
if |H (k, K)| < |H(k +1, &)] 
piv(k) = 1; H(k, k:n) + H(k + 1, k:n) 
else 
piv(k) = 0 
end 
if H(k,k) #0 
t = —H(k +1, k)/H(k,k) 
for ; =kK+1:n 
H(k 41,5) = H(k + 1,5) + tH(k, j) 
end 
A(k+1,k)=% 
end 
end 


This algorithm requires n? flops. 


4.3.5 Band Cholesky 


The rest of this section is devoted to banded Az = b problems where the 
matrix A is also symmetric positive definite. The fact that pivoting is 
unnecessary for such matrices leads to some very compact, elegant algo- 
rithms. In particular, it follows from Theorem 4.3.1 that if 4 = GGT is the 
Cholesky factorization of A, then G has the same lower bandwidth as A. 
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This leads to the following banded version of Algorithm 4.2.1, gaxpy-based 
Cholesky 


Algorithm 4.3.5 (Band Cholesky: Gaxpy Version) Given asymmet- 
ric positive definite A € IR" *" with bandwidth p, the following algorithm 
computes a lower triangular matrix G with lower bandwidth p such that 
A = GG", For all i > j, G(i, j) overwrites A(i, j). 


for } = l:n 
for k = max(l,j— pì\:j—1 
à = min(k + p,n) 
A(j:A, j) 40:33) — AG, K)AG:A, k) 
end 
A = min(j + p,n) 
A(j:A, j) = AG, j)/ V AG, i) 


end 


If n `> p then this algorithm requires about n(p? + 3p) flops and n square 
roots. Of course, in a serious implementation an appropriate data structure 
for A should be used. For example, if we just store the nonzero lower 
triangular part, then a (p + 1)-by-n array would suffice. (See §1.2.6) 

If our band Cholesky procedure is coupled with appropriate band trian- 
gular solve routines then approximately np? + 7np + 2n flops and n square 
roots are required to solve Ar = b. For small p it follows that the square 
roots represent a significant portion of the computation and it is prefer- 
able to use the LDLT approach. Indeed, a careful flop count of the steps 
A = LDLT, Ly = b, Dz = y, and LT x = z reveals that np? + 8np +n flops 
and no square roots are needed. 


4.3.6 X lridiagonal System Solving 


As a sample narrow band LDL? solution procedure, we look at the case of 
symmetric positive definite tridiagonal systems. Setting 


1 sx 0 


& 1 
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and D = diag(d;,...,d«) we deduce from the equation A = LDL? that: 


ây = di 
Ükk-1 = Ck idi k=2:n 
ükk = del dk a = dk + Ck 10k k-1 k = 2:n 


Thus, the d; and g; can be resolved as follows: 


dı = 01 
for k = 2:n 

&ek—i = Gk k-1/dk-1; dk = Oke — Ck 10k k-1 
end 


To obtain the solution to Ar = b we solve Ly = b, Dz = y, and L^ z = z. 
With overwriting we obtain 


Algorithm 4.3.6 (Symmetric, Tridiagonal, Positive Definite Sys- 
tem Solver) Given an n-by-n symmetric, tridiagonal, positive definite 
matrix A and b € IR", the following algorithm overwrites 5 with the solu- 
tion to Az = b. It is assumed that the diagonal of A is stored in d(1:n) and 
the superdiagonal in e(1:n — 1). 


for k= 2:n 
t = e(k — 1); e(k — 1) =t/d(k — 1); d(k) = d(k) — te(k — 1) 
end 
for k = 2:n 
b(k) = b(k) — e(k — 1)b(k — 1) 
end 


b(n) = b(n)/d(n) 
for k=n— 1: -1:1 

b(k) = b(k)/d(k) — e(k)b(k + 1) 
end 


This algorithm requires 8n flops. 


4.3.7 . Vectorization Issues 


The tridiagonal example brings up a sore point: narrow band problems and 
vector/pipeline architectures do not mix well The narrow band implies 
short vectors. However, it is sometimes the case that large, independent 
sets of such problems must be solved at the same time. Let us look at how 
such a computation should be arranged in light of the issues raised in 81.4. 

For simplicity, assume that we must solve the n-by-n unit lower bidiag- 
onal systems 

AFi) — S — k= dum 
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and that m >> n. Suppose we have arrays E(1:n — 1, Lim) and B(1:n, Lim) 
with the property that E(1:n — i,k) houses the subdiagonal of A) and 
B(1:n, k) houses the kth right hand side b . We can overwrite b with 
the solution z) as follows: 


for k = l:m 
for i = 2:n 
Bi, k) = Bi, k) - E(i - 1, ) B(3— 1,k) 
end 
end 


The problem with this algorithm, which sequentially solves each bidiagonal 
system in turn, is that the inner loop does not vectorize. This is because 
of the dependence of B(i,k) on B(1 — 1,k). lf we interchange the k and i 
loops we get 


for i = 2:n 
for k = 1:m 
B(i,k) = Bi, k) - E(1— 1, k) B(1 — 1, k) (4.3.1) 
end 
end 


Now the inner loop vectorizea well as it involves a vector multiply and a 
vector add. Unfortunately, (4.3.1) is not a unit stride procedure. However, 
this problem is easily rectified if we store the subdiagonals and right-hand- 
sides by row. That is, we use the arrays E(1:m, Ln — 1) and B(1:m, 1:n — 1) 
and store the subdiagonal of AU? in E(k, 1:n — 1) and 06/97 in B(k,1:n). 
The computation (4.3.1) then transforms to 


for i = 2:n 
for k =1:m 
B(k,i) = B(k, i) — E(k,i - 1)B(k,i- 1) 
end 
end 


illustrating once again the effect of data structure on performance. 


4.3.8 Band Matrix Data Structures 


The above algorithms are written as if the matrix A is conventionally stored 
in an n-by-n array. In practice, a band linear equation solver would be or- 
ganized around a data structure that takes advantage of the many zeroes 
in A. Recall from 81.2.6 that if A has lower bandwidth p and upper band- 
width q it can be represented in a (p + q + 1)-by-n array A.band where 
band entry a; is stored in A.band(1— j --q-- 1, 7). In this arrangement, the 
nonzero portion of Á's jth column is housed in the jtk column of A.band. 
Another possible band matrix data structure that we discussed in §1.2.8 
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involves storing A by diagonal in a 1-dimensional array A.diag. Regardless 
of the data structure adopted, the design of a matrix computation with a 
band storage arrangement requires care in order to minimize subecripting 
overheads. 


Problems 


P4.3.1 Derive a banded LDM? procedure similar to Algorithm 4.3.1. 
P 4.3.2 Show how the output of Algorithm 4.3.4 can be used to solve the upper Hen- 
senberg system Hz = b. 
P4.3.3 Give an algorithm for solving an unsymmetric tridiagonal system Az = b that 
uses Gaussian elimination with partial pivoting. It should require only four n-vectors of 
floating point storage for the factorization. 
P4.3.4 For C € R^*" define the profile indices m(C,i} = min(j:c; 4 0}, where 
= Ln. Show that if A = GGT is the Cholesky factorization of A, then m(A, i) = 

m(G, i) for i = l:n. (We say that G bas the same profile as A.) 
P4.3.5 Suppose A c R°™™ is symmetric positive definite with profile indices mj = 
m{A,i) where i = lm. Assume that A is stored in a one-dimensional array v as follows: 
v = (a11,02,m45- 022, 3,mg 033, Un ma onn). Write an algorithm that 
overwrites v with the corresponding entries of the Cholesky factor G and then uses this 
factorization to solve Ar = b. How many flopa are required? 
P4.3.8 For C c RO" define p(C, i) = max(j:c;; € 0). Suppose that A c KC *" has an 
LU factorization A = LU and that: 

m(A1) € m(A2) $ --- < m(Am) 

p(41 < p(A42) $ «+ s PLA A) 
Show that m(A, i) = m(L,i) and p( A, i) = p(U,i) for i = 1:n. Recall the definition of 
mí( A, i) from P4.3.4. 
P4.3.7 Develop a gaxpy version of Algorithm 4.3.1. 
P4.3.8 Develop a unit stride, vectorizable algorithm for solving the symmetric positive 
definite tridiagonal systems AD) = p(k), Assume that the diagonals, superdiagonals, 
ee are stored by row in arrays D, E, and B and that b is overwritten 
with z 
P4.3.9 Develop a version of Algorithm 4.3.1 in which A is stored by diagonal. 
P4.3.10 Give an example of a 3-hy-3 symmetric positive definite matrix whose tridiag- 
onal part is not positive definite. 
P4.3.11 Consider the Ax = b problem where 


2 -1 Os D -1 
ej Uh wy m = ow 
dum 0-1 2 
0 
CC mi EE a | 
-1 QO = 0-1 2 


This kind of matrix arises in boundary value problema with periodic boundary conditions. 
(a) Show A is singular. (b) Give conditions that b must satisfy for there to exist a solution 
and specify an algorithm for solving it. (c). Assume that n is even and consider the 
permutation 

P= [ey en £2 €n-163 c] 
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where e, is the kth column of fn. Describe the transformed system PT AP(PT 2} = PT 
and show how to solve it. Assume that there is a solution and ignore pivoting. 


Notes and Raferances for Sec. 4.3 
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4.4 Symmetric Indefinite Systems 


A symmetric matrix whose quadratic form x7 Az takes on both positive and 
negative values is called indefinite. Although an indefinite A may have an 
LDL? factorization, the entries in the factors can have arbitrary magnitude: 


ce 1| |1 0 € 0 1 90 T 
1 0| !1/e 1 0 -l/e le 1| ^ 
Of course, any of the pivot strategies in 83.4 could be invoked. However, 
they destroy symmetry and with it, the chance for à "Cholesky speed" 
indefinite system solver. Symmetric pivoting, i.e., data reshufflings of the 
form A —. PAPT, must be used as we discuksed in $4.2.9. Unfortunately, 
symmetric pivoting does not always stabilize the LDL? computation. If c 
and cz are small then regardless of P, the matrix 
A up | E | pT 
l €4 
has small diagonal entries and large numbers surface in the factorization. 
With symmetric pivoting, the pivots are always selected from the diagonal 
and trouble results if these numbers are small relative to what must be 
zeroed off the diagonal. Thus, LDLT with symmetric pivoting cannot be 
recommended as a reliable approach to symmetric indefinite system solving. 
It seems that the challenge is to involve the off-diagonal entries in the 
pivoting process while at the same time maintaining symmetry. 
In this section we discuss two ways to do this. The first method is due 
to Aasen(1971) and it computes the factorization 


PAP? = LTLT (4.4.1) 


where L = (£j) is unit lower triangular and T is tridiagonal. P is a permu- 

tation chosen auch that |Z;;| < 1. In contrast, the diagonal pivoting method 

due to Bunch and Parlett (1971) computes a permutation P such that 
PAPT = LDLT (4.4.2) 


where D is a direct sum of 1-by-1 and 2-by-2 pivot blocks. Again, P is 
chosen so that the entries in the unit lower triangular L satisfy |¢,;/ < 1. 
Both factorizations involve n*/3 flops and once computed, can be used to 
solve Az = b with O(n?) work: 


PAPT =LTL’ Lz = P),Tw=2z,L'y=w,r=Py > Ar=b 


PAP’ = LDLT, Lz = Pb, Dw=2,L’y=wu,2=Py > Ar=b 


The only thing “new” to discuss in these solution procedures are the Tw = z 
and Dw = z systems. 
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In Aasen's method, the symmetric indefinite tridiagonal system Tw = z 
is solved in O(n) time using band Gaussian elimination with pivoting. Note 
that there is no serious price to pay for the disregard of symmetry at this 
level since the overall process is O(n?). 

In the diagonal pivoting approach, the Dw = z system amounta to a set 
of 1-by-1 and 2-by-2 symmetric indefinite systems. The 2-by-2 problems 
can be handled via Gaussian elimination with pivoting. Ágain, there is no 
harm in disregarding symmetry during this O(n) phase of the calculation. 

Thus, the central issue in this section is the efficient computation of the 
factorizations (4.4.1) and (4.4.2). 


4.4.1 The Parlett-Reid Algorithm 


Parlett and Reid (1970) show how to compute (4.4.1) using Gauss trans- 
forms. Their algorithm is sufficiently illustrated by displaying the k = 2 
step for the case n = 5. At the beginning of this step the matrix A has 
been transformed to 


C1 By 0 0 0) 
By œ vs u Us 
AQ) = M,PAPIMT = | Q wv x x x 
0 v x x X 
Ü v x X Xx 


where P, is a permutation chosen so that the entries in the Gauss trans- 
formation M1 are bounded by unity in modulus. Scanning the vector 
(va va us)? for its largest entry, we now determine a 3-by-3 permutation P; 
such that 


_ | vs Us 
Py] vq | = | ù => |#3| = max(|9s], |94], [9s] ) - 
Üs 

TE thks anal denent ia zeio, we set M3 = P5 = I and proceed to the 
next step. Otherwise, we set P, = diag(/5, P3) and M; = I — ae] with 


a) = (0 0 0 à&/ó 9/6 )" 


and observe that 
a f, 0 0 O0 
Bi a2 9$, 0 0 
AU = M;P,AOPTMT = | 0 t x x x 
0 O0 x x x 
0 0 x x x 


In general, the process continues for n —2 steps leaving us with a tridiagonal 
matrix 


T = A07? = (M,-2P, 2: MiP, )A(Mn-2Pa-2 +: Mi Py)? . 
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It can be shown that (4.4.1) holds with P = P, 4... P, and 
L-(M,-3P,-3: 7 Mi P PT. 


Analysis of L reveals that its first column is e, and that its subdiagonal 
entries in column k with k > 1 are “made up" of the multipliers in Mx-—1. 

The efficient implementation of the Parlett-Reid method requires care 
when computing the update 


AF) = M,(P, AGW) PIME. (4.4.3) 


To see what is involved with a minimum of notation, suppose B = BT has 
order n — k and that we wish to form: B, = (I — wef )B(I — weT)" where 
w € R"~¥ and e, is the first column of Z, 4. Such a calculation is at the 
heart of (4.4.3). If we set 


u = Be, — Ex 
then the lower half of the symmetric matrix By, = B — wuT — uw? can 


be formed in 2(n — k)? flops. Summing this quantity as k ranges from 1 
to n — 2 indicates that the Parlett-Reid procedure requires 2n?/3 flops— 
twice what we would like. 


Exampie 4.4.1 If the Parlett-Reid algorithm is applied to 


> 

ll 
a | 
O3 M om C 
M) OR) OMS B 


then 


P = [e e4 e3 e2] 

Mi = Ia —(0, 0, 2/3, 1/3, T eT 
P = [e ezes es] 

M3 = l -— (0, 0, 0, 1/2)TeT 


and PAPT = LTL? , where P — (ej, es, e4, €2]; 


1 o 0 9 0 3 0 0 
o 1 o0 0 _13 4 ams o 
L-|o yas 1 of 4 T= ]|o as 19 o |’ 

i 0 /2 


© 2/3 1/2 


4.4.2 The Method of Aasen 


An n?/3 approach to computing (4.4.1) due to Aasen (1971) can be derived 
by reconsidering some of the computations in the Parlett-Reid approach. 
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We need a notation for the tridiagonal T: 


ary By sere 0 
B Op 
T= 
, s Pn- 1 
0 Ba-1 On 


For clarity, we temporarily ignore pivoting and assume that the factoriza- 
tion A = LTLT exists where L is unit lower triangular with L(:,1) = ei. 
Aasen’s method is organized as follows: 


for j = Ln 
Compute h(1:j) where h = TLTe; = Hej. 
Compute a(7). 
if j «€ n—1 
Compute (7) (4.4.4) 
end 
ifjzn-2 
Compute Lí(j + 2:n, j + 1). 
end 
end 


Thus, the mission of the jth Aasen step is to compute the jth column of 
T and the (j + 1)-st column of L. The algorithm exploits the fact that the 
matrix H = TLT is upper Hessenberg. As can be deduced from (4.4.4), 
the computation of a(7), 8(j), and L(j + 2:n, j 1) hinges upon the vector 
A(1:7) = H(1:j, j). Let us see why. 

Consider the jth column of the equation A = LH: 


AG, 3) = LE, 1:j + DAC * 1). (4.4.5) 


This says that A(:,7) is a linear combination of the first 7 + 1 columns of 
L. In particular, 


Á(j 1:15) = L(j Ln, E3)h(1:5) + LO + 1:n,j + DAG +1). 
It follows that if we compute 
v(j + En) = AQ + Un, j) - LG + En, E:3)8(1), 


then 
L(+ i:n, j+ DAG +1) = v + Lm). (4.4.6) 
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Thus, L(j + 2:1, j + 1) is a sealing of v(j + 2:n). Since L is unit lower 
triangular we have from (4.4.6) that 


v(j +1) = h(j +1) 


and so from that same equation we obtain the following recipe for the 
(3 + 1)-st column of L: 


Lj + 2:n, j +1) = v(j + 2in)/v(j + L). 
Note that L(j + 2:n, j + 1) is a scaled gaxpy. 
We next develop formulae for a(j} and 8(j). Compare the (7, j) and 
(j 41,3) entries in the equation H = TL’. With the convention 8 (0) = 0 
we find that h(j) = 8(j — 1)L(j, j — 1) + a(j) and h(j + 1) = v(j + 1) and 


sQ 
AG) - BG — LG, 3 — 1) 


at) 


8) 


With these recipes we can completely describe the Aasen procedure: 


v(j +1). 


for j = 1:n 

Compute h(1:7) where A = TLTe;. 

ifj-1vj-2 
a(j) = ^(j) 
a(j) = h(j) ~ 8(j - 1)LG.j - 1) 

end 

if jXn-1 (4.4.7) 
v(j + Ln) = AG + Lin, 7) — LG + Lin, 1:7)A(1:7) 
8(5) = vG +1) 

end 


ifj€n-2 
LỌ + 2:1, j +1) = v(j + 2:n)/v(5 + 1) 


else 


end 
end 


To complete the description we must detail the computation of (1:7). 
From (4.4.5) it follows that 
A(1:5, j} = L(1:5, 1:5)A(1:7) . (4.4.8) 


This lower triangular system can be solved for h(1:7) since we know the first 
j columns of L. However, a much more efficient way to compute H (1:7, j) 
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is obtained by exploiting the jth column of the equation H = TLT. In 
particular, with the convention that S(0)L(5, 0) = 0 we have 


h(k) = B(k — 1)LG, k — 1) + a(k)L($ k) + B(X)L(G k + 1). 


for k = 1:7. These are working formulae except in the case k = j because 
we have not yet computed a(j) and 6(j). However, once A(1:7—1) is known 
we can obtain h(j) from the last row of the triangular system (4.4.8), i.e., 


j-1 
h(j) = AG.3) - ML LG, Eh). 


k=] 


Collecting results and using a work array f(1:n) for L(7, 1:7) we see that 
the computation of h(1:7) in (4.4.7) can be organized as follows: 


ij; —1 
h(1) = A(1, 1) 
elseif ; = 2 
h(1)  &(1); h(2) = A(2,2) (4.4.9) 
else 
£(0} = 0; €(1) = 0; 4(2:; - 1) = L(,2:5 - 1); 4G) =1 
h(J) = A(3,3) 
for k = 1:j— 1 


h(k} = G(k — 1)/(k — 1) + a(K)Z(k) + BCR) e(k + 1) 
h(j) = h(j) — (K)A(k) 
end 
end 


Note that with this O(j) method for computing h(1:), the gaxpy calcula- 
tion of v(j + l:n) is the dominant operation in (4.4.7). During the jth step 
this gaxpy involves about 27(n — j} fope. Summing this for j = 1:n shows 
that Aasen's method requires n*/3 flops. Thus, the Aasen and Cholesky 
algorithms entail the same amount of arithmetic. 


4.4.3 Pivoting in Aasen’s Method 


As it now stands, the columns of L are scalings of the v-vectors in (4.4.7). 
If any of these scalings are large, i.e., if any of the v(j + 1)’s are smali, 
then we are in trouble. To circumvent this problem we need only permute 
the largest component of v(j + 1:n) to the top position. Of course, this 
permutation must be suitably applied to the unreduced portion of A and 
the previously computed portion of L. 


Algorithm 4.4.1 (Aasen's Method) lf A c EC "" is symmetric then 
the following algorithm computes a permutation P, a unit lower triangular 


4.4. SYMMETRIC INDEFINITE SYSTEMS 167 


L, and a tridiagonal T such that PAPT = LTLT with |Z(i,j)| € 1. The 
permutation P is encoded in an integer vector piv. In particular, P = 
P,---F,~2 where P; is the identity with rows piv(j) and j+1 interchanged. 
The diagonal and subdiagonal of T are stored in a(1:n) and B(1:n — 1), 
respectively. Only the subdiagonal portion of L(2:n, 2:n) is computed. 


for j = l:n 
Compute A(1:7) via (4.4.9). 
ifj21vj22 
alj) = h(3) 
else 


a(j) = h(j) - BG - 1)L(.j — 1) 
end 


if jc n-1 
v(j + iin) = Á(j + Lin, j) - L(3 + i:n, 1:5)A(1:5) 
Find q so {v(g)| = || v(j + Ln) [log with j --1€q € n. 
piv(;j) =q; v(j +1) = v(g); L(j + 1,2:j) ^ L(q,2:7) 
A(3 t 1,j + lin) + A(g, j + in) 
A(z + 1:1, 3 +1) + AG + ling) 
A(z) = vG +1) 


end 
ifj<n-2 
L(j + 2:n, j - 1) = w(j + 2:n) 
if v(j +1) 40 
L(j + 2:n, j +1) = L(j + 2m, j + 1)/v(j + 1) 
end 
end 


end 


Áasen's method is stable in the same sense that Gaussian elimination with 
partial pivoting is stable. That is, the exact factorization of a matrix near 
A is obtained provided || T ]2/1] A ijz = 1, where T is the computed version 
of the tridiagonal matrix T. In general, this is almost always the case. 

In & practical implementation of the Aasen algorithm, the lower trian- 
gular portion of A would be overwritten with L and T. Here is n = 5 
case: 


A+ | t3 fe c 
fag lo s 04 
fs2 fsa fs Be os 


Notice that the columns of L are shifted left in this arrangement. 
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4.4.4 Diagonal Pivoting Methods 


We next describe the computation of the block LDL? factorization (4.4.2). 
We follow the discussion in Bunch and Parlett (1971). Suppose 


where P, is a permutation matrix and s = 1 or 2. If A is nonzero, then it is 
always possible to choose these quantities so that E is nonsingular thereby 
enabling us to write 


T s I, 0 E ü I, E-icT 
cia | cz IE B - CE-1CT H D 


In—s 


For the sake of stability, the s-by-s "pivot" E should be chosen so that the 
entries in . 

A = (à) = B-CE !cC* (4.4.10) 
are suitably bounded. To this end, let a € (0,1) be given and define the 
size measures 


max j|aij| 


S 
l 


Hy max [a4]. 
1 


The Bunch-Parlett pivot strategy is as follows: 


if 441 > apo 

s=] 

Choose P, so |e1i| = 11. 
eise 

3-2 

Choose P 30 le22| = Jp. 
end 


It is easy to verify from (4.4.10) that if s = 1 then 


là] S (1--a ^  )uo (4.4.11) 
while s = 2 implies 
laf < =—% 4.4.12) 
à| € 1 — (4.4. 
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By equating (1+ a7')*, the growth factor associated with two s = 1 steps, 
and (3—a)/(1— a), the corresponding s = 2 factor, Bunch and Parlett con- 
clude that a = (1 + /17)/8 is optimum from the standpoint of minimizing 
the bound on element growth. 

The reductions outlined above are then repeated on the n — s order 
symmetric matrix A. A simple induction argument establishes that the 
factorization (4.4.2) exists and that n°/3 flops are required if the work 
associated with pivot determination is ignored. 


4.4.5 Stability and Efficiency 


Diagonal pivoting with the above strategy is shown by Bunch (1971) to be 
as stable as Gaussian elimination with complete pivoting. Unfortunately, 
the overall process requires between n*/12 and n?/6 comparisons, since pio 
involves a two-dimensional search at each stage of the reduction. The actual 
number of comparisons depends on the total number of 2-by-2 pivots but 
in general the Bunch-Parlett method for computing (4.4.2) is considerably 
slower than the technique of Aasen. See Barwell and George(1976). 

This is not the case with the diagonal pivoting method of Bunch and 
Kaufman (1977). In their scheme, it is only necessary to scan two columns 
at each stage of the reduction. The strategy is fully illustrated by consid- 
ering the very first step in the reduction: 


a = (1 + V/17)/8; A = |arı| = max{|aa;|, loi 


if A> 0 
if laii] = cra 
3 -— I P =f 
else — 
a = |apr| = max{|ar,,..., lac 1e] lae ciue. Janel} 
if alani 2 oA 
szl,Pzl 
elseif |a| > ac 
s = 1 and choose Pj so (PL AP) = a. 
olse 
3 = 2 and choose P, so (PT AP) = az. 
end 
end 
end 


Overall, the Bunch-Kaufman algorithm requires n?/3 flops, O(n?) compar- 
isons, and, like all the methods of this section, n2/2 storage. 
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Example 4.4.2 If the Bunch-Kaufman algorithm is applied to 
I i 20 
A=]10 1 30 
20 30 1 
then in the first step À = 20, r = 3, c = 30, and p= 2. The permutation P = [ea e2 ei] 
is applied giving 
1 30 20 | 


PAPT = | 30 1 10 
20 10 | 


A 2-by-2 pivot is then used to produce the reduction 


1 0 0 i 30 0 1 e of 
PAPT = g 1 0 30 1 0 0 1 0 
3115 .8583 1 0 O0 -117920 3118 .8563 1 


4.4.8 A Note on Equilibrium Systems 
A very important class of symmetric indefinite matrices have the form 
Po. C Bin 
= | BF 0| p (4.4.13) 
n B 


where C is symmetric positive definite and B has full column rank. These 
conditions ensure that A is nonsingular. 

Of course, the methods of this section apply to A. However, they do not 
exploit its structure because the pivot strategies "wipe out” the zero (2,2) 
block. On the other hand, here is a tempting approach that does exploit 
A's block structure: 


(a) Compute the Cholesky factorization of C, C = GGT. 

(b) Solve GK = B for K € R”, 

(c) Pris the Cholesky factorization of KTK = BTC-1B, HHT = 
K. 


From this it follows that 
AS[ & 9 GT K 
| Kr H 0 -HF | 


In principle, this triangular factorization can be used to solve the equtlib- 


rium system 
| 5 JUL (4.4.14) 
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However, it is clear by considering steps (b) and (c) above that the accuracy 
of the computed solution depends upon «(C) and this quantity may be 
much greater than x(A). The situation has been carefully analyzed and 
various structure-exploiting algorithms have been proposed. A brief review 
of the literature is given at the end of the section. 

But before we close it is interesting to consider a special case of (4.4.14) 
that clarifies what it means for an algorithm to be stable and illustrates 
how perturbation analysis can structure the search for better methods. 
In several important applications, g = 0, C is diagonal, and the solution 
subvector y is of primary importance. A manipulation of (4.4.14) shows 
that this vector is specified by 


y = (BTC-1 Bg)! BT C^ f. (4.4.15) 


Looking at this we are again led to believe that x(C’) should have a bearing 
on the accuracy of the computed y. However, it can be shown that 


| (BTC-1B)- 1 BTC"! | < bp (4.4.16) 


where the upper bound yg is independent of C, a result that (correctly) 
suggests that y is not sensitive to perturbations in C. À stable method for 
computing this vector should respect this, meaning that the accuracy of 
the computed y should be independent of C. Vavasis (1994) has developed 
a method with this property. It involves the careful assembly of a matrix 
V € R"™ =») whose columns are a basis for the nullspace of B'C~!. The 
n-by-n linear system 


ia. vi[* |-7 


is then solved implying f = By + Vq. Thus, BTC-1f = BTC-!By and 
(4.4.15) holds. 


Problems 


P4.4.1 Show that if all the 1-by-1 and 2-by-2 principal submatrices of an n-by-n 
symmetric matrix A are singular, then A is zero. 

P4.4.2 Show that no 2-by-2 pivots can arise in the Bunch-Kaufman algorithm if A is 
positive definite. 

P4.4.3 Arrange Algorithm 4.4.1 so that only the lower triangular portion of A is 
referenced and so that a(j) overwrites A(j, j) for j = l:n, 8(7) overwrites A(j + 1,3) for 
j= Ln — 1, and L(i, j) overwrites A(i, j — 1) for j = 22n 1 and i = j + En. 

P4.4.4 Suppose A € R”*P is nonsingular, symmetric, and strictly diagonally dominant. 
Give an algorithm that computes the factorization 


ma -[$ -m [80 oat 


where R e R*** and M e R(^-Xx(—5) are lower triangular and nonsingular and II is 
a permutation. 
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P4.4.5 Show that if 
An At n 


An -An p 

n p 
ia symmetric with Ay) and Azn positive definite, then it has an LDL? factorization with 
the property that 


Ax 


o|? al 


0 -D3 

where D, € R°™" and D4 € RP*P have positive diagonal entries, 

P4.4.6 Prove (4.4.11) and (4.4.12). 

P4.4.7 Show that -(BT C71B)-! is the (2,2) block of A7! where A is given by (4.4.13). 


P4.4.8 The point of this problem is to consider a special case of (4.4.15). Define the 
matrix 

M(a) 2 (BTC-! B)! g' c7! 
where 

C = (Is 4 aese) a -l. 
and ey = In{:,k). (Note that C is just the identity with a added to the (k, k} entry.) 
Assume that B c F^*? has rank p and show that 

M{a) = (BT B)! BT (1. - ew” ) 


where w = (In — B(BT B) ! BT)e,. Show that if | wl]; = 0 or |] wil, = 1, then 
l| Af (ex) [la = 1/omin(B). Show that if 0 < f| w ||; < 1, then 


l 1 
| M{a) lla € wae | ga / ennt 


Thus, || M(a) ||; has an a-independent upper bound. 


L + owl wy 
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M.T. Jones and M.L. Patrick (1993). “Bunch-Kaufman Factorization for Real Symmetric 
Indefinite Banded Matrices," SIAM J. Matriz Anal. Appl. 14, 553-559. 


Because “future” columnas must be scanned in the pivoting process, it is awkward (but 
possible) to obtain a g&xpy-rich diagonal pivoting algorithm. On the other hand, Assen's 
method is naturally rich in gaxpy’s. Block versions of both procedures are possible. LA- 
PACK uses the diagonal pivoting method. Various performance issues are discussed in 


V. Barwell and J.A. George (1976). “A Comparison of Algorithms for Solving Symmetric 
Indefinite Systems of Linear Equations,” ACM Trans. Math. Soft. $, 242-51. 

M.T. Jones and M.L. Patrick (1994). "Factoring Symmetric Indefinite Matrices on High- 
Performance Architectures,” SIAM J. Matric Anal Appi 15, 273—283. 


Another idea for a cheap pivoting strategy utilizes error bounds based on more liberal 
interchange criteria, an idea borrowed from some work done in the area of sparse elimi- 
nation methods. See 


R. Fletcher (1976). “Factorizing Symmetric Indefinite Matrices,” Lin. Alg. and Its 
Applic. 14, 251-72. 


Before using ary symmetric Az = b solver, it may be advisable to equilibrate A. An 
O(n*) algorithm for accomplishing this task is given in 


J.R. Bunch (1971). “Equilibration of Symmetric Matrices in the Max-Norm," J. ACM 
18, 566-72. 


Analogues of the symmetric indefinite solvers that we have presented exist for skew- 
symmetric systems. See 


J.R. Bunch (1982). “A Note on the Stable Decomposition of Skew Symmetric Matrices,” 
Math. Comp. 158, 475-480. 


The equilibrium system literature i scattered among the several application areas where 
it has an important role to play. Nice overviews with pointers to this literature include 


G. Strang (1988). "A Framework for Equilibrium Equations,” SIAM Review 30, 283-297. 
S.A. Vavasis (1994). “Stable Numerical Algorithms for Equilibrium Systema,” SIAM J. 
Matriz AnaL Appi. 15, 1108-1131. 


Other papers include - 


C.C. Paige (1979). “Fast Numerically Stable Computations for Generalized Linear Least 
Squares Problems,” SIAM J. Num. Anal 15, 165-71. 

A. Björck and LS. Duf (1980). “A Direct Method for the Solution of Sparse Linear 
Least Squares Problems,” Lin. Alg. and Ita Applic. 34, 43-67. 

A. Björck (1992). “Pivoting and Stability in the Augmented System Method," Proceed- 
ings of the 14th Dundee Conference, D.F. Grilfitha and G.A. Watson (eda), Longman 
Scientific and Technical, Essex, U.K. 

P.D. Hough and S.A. Vavasis (1996). “Complete Orthogonal Decomposition for Weighted 
Least Squares,” SIAM J. Matriz Anal. Appi, to appear. 


Some of these papers make use of the QF factorization and other least squares idees 
that are discussed in the next chapter and §12.1. 

Problems with structure abound in matrix computations and perturbation theory 
has a key role to play in the search for stable, efficient algorithms. For equilibrium sys- 
tems, there are several results like (4.4.15) that underpin the most effective algorithms. 
See 
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A. Forsgren (1995). “On Linear Least-Squares Problems with Diagonally Dominant 
Weight Matrices,” Technical Report TRITA-MAT-1995-OS2, Department of Mathe- 
matics, Royal Institute of Technology, S-100 44, Stockholm, Sweden. 


and the included references. A discussion of (4.4.15) may be found in 


G.W. Stewart (1989). “On Scaled Projections and Pseudoinverses,” Lin. Alg. and Its 
Applic. 118, 189—193. 

D.P. O'Leary (1990). “On Bounds for Scaled Projections and Pseudoinverses," Lin. Aly. 
and Ite Applic. 132, 115—117. 

M.J. Todd (1990). “A Dantzig-Wolfe-like Variant of Karmarker's Interior-Point Linear 
Programming Algorithm," Operations Research 38, 1006-1018. 


4.5 Block Systems 


In many application areas the matrices that arise have exploitable block 
structure. As a case study we have chosen to discuss block tridiagonal 
systems of the form 


D Fi ia Ü T) bi 
Ei Do ay : Tə ba 
E 2 f=]: (4.5.1) 
: P ^i Fa-1 » è 
0 € En-1 Dn In b, 


Here we assume that all blocks are q-by-g and that the r; and b; are in 
R’. In this section we discuss both a block LU approach to this problem as 
well as a divide and conquer scheme known as cyclic reduction. Kronecker 
product systems are briefly mentioned. 


4.5.1 Block Tridiagonal LU Factorization 


We begin by considering a block LU factorization for the matrix in (4.5.1). 
Define the block tridiagonal matrices A, by 

D, A idis 0 

E, Di l 

Åk x 


H 
^ 

ll 
p 

a 
a, 
e 
rds 
t2 
— 


4.5. BLOCK SYSTEMS 175 


Comparing blocks in 
I ee 0 U, Fi e 0 
Lı I : 0 Us f 
AL Noo LOCO (4.5.3) 
: 2s : MEC C NE AES 
0 -- Ln- I 0 --- 0 U, 


we formally obtain the following algorithm for the L; and U;: 


Ui = D, 
for 1 = 2:n 
Solve Li -1U;-1 = Ej-1 for L;i. (4.5.4) 


U; = D; < Li-Fi- 
end 


The procedure is defined so long as the U; are nonsingular. This is assured, 
for example, if the matrices A,,...,A4, are nonsingular. 

Having computed the factorization (4.5.3), the vector z in (4.5.1) can 
be obtained via block forward and back substitution: 


yi =b 
for i = 2:n 
yi = bi — Liciy-a 
end (4.5.5) 


Solve U,z4 = Yn for ty. 
forizn-l:-l: 

Solve Uiz; = Yi = Fyti41 for Ti. 
end 


To carry out both (4.5.4) and (4.5.5), each U; must be factored since linear 
systems involving these submatrices are solved. This could be done using 
Gaussian elimination with pivoting. However, this does not guarantee the 
stability of the overall process. To see this just consider the case when the 
block size q is unity. 


4.5.2 Block Diagonal Dominance 


In order to obtain satisfactory bounds on the L; and U; it is necessary 
to make additional assumptions about the underlying block matrix. For 
example, if for t = l:n we have the block diagonal dominance relations 


ID; h (R-i lh +E) 2.2450 (4.5.6) 
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then the factorization (4.5.3) exists and it is possible to show that the Li 
and U; satisfy the inequalities 


IL € 1 (4.5.7) 
|Uil, € Aa l (4.5.8) 


4.5.3 Block Versus Band Solving 


At this point it is reasonable to ask why we do not simply regard the matrix 
A in (4.5.1) as a qn-by-qn matrix having scalar entries and bandwidth 
2q — 1. Band Gaussian elimination as described in $4.3 could be applied. 
The effectiveness of this course of action depends on such things as the 
dimensions of the blocks and the sparsity patterns within each block. 


To illustrate this in a very simple setting, suppose that we wish to solve 


[& n][u] "ls «sa 


where Dı and D; are diagonal and F, and E, are tridiagonal. Assume 
that each of these blocks is n-by-n and that it is "safe" to solve (4.5.9) via 
(4.5.3) and (4.5.5). Note that 


Ui = IA (diagonal) 
L = EU! (tridiagonal) 
U = D- Lif (pentadiagonal) 
y = b 
vo = b- E (D'n) 

Uj13 = y 

Dizi = y1 — Fir. 


Consequently, some very simple n-by-n calculations with the original banded 
blocks renders the solution. 

On the other hand, the naive application of band Gaussian elimination 
to the system (4.5.9) would entail a great deal of unnecessary work and 
storage as the system has bandwidth n + 1. However, we mention that by 
permuting the rows and columns of the gystem via the permutation 


P= [e1; 8n. 1; €, - £n, £30] (4.5.10) 
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we find (in the n — 5 case) that 


PAPT = 


Ce OoUoOoOO]OUXUOxx 

oco eo O20 K K xX 
oo ook @xX XX X & 
ooocoo kK XX & xX 
CoO x OX KX OOO 
Ooe x KK OXO Q 
xO x X XOOQOCGC OG 
ax xX XO ROCCO SG 
x X xX oOoooouo 
X X oOx cocoootduoctu 


Ü 


This matrix has upper and lower bandwidth equal to three and so a very 
reasonable solution procedure results by applying band Gaussian elimina- 
tion to this permuted version of A. 

The subject of bandwidth-reducing permutations is important. See 
George and Liu (1981, Chapter 4). We also refer to the reader to Varah 
(1972) and George (1974) for further details concerning the solution of block 
tridiagonal systems. 


4.5.4 Block Cyclic Reduction 


We next describe the method of block cyclic reduction that can be used 
to solve some important special instances of the block tridiagonal system 
(4.5.1). For simplicity, we assume that A has the form 


DF “ar d 
F D : 

Am [t 3X o3. e RM (4.5.11) 
: a nE 
TES F D 


where F and D are q-by-q matrices that satisfy DF = FD. We also assume 
that n = 2* — 1. These conditions hold in certain important applications 
such as the discretization of Poisson's equation on a rectangle. In that 
situation, 

4 -l e. 0 


D = LIE (4.5.12) 
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and F = —I,. The integer n ia determined by the size of the mesh and can 
often be chosen to be of the form n = 2* — 1. (Sweet (1977) shows how to 
proceed when the dimension is not of this form.) 

The basic idea behind cyclic reduction is to halve the dimension of the 
problem on hand repeatedly until we are left with a single g-by-q system 
for the unknown subvector ty.-.. This system is then solved by standard 
means. The previously eliminated z; are found by a back-substitution 
process. 

The general procedure is adequately motivated by considering the case 
n= T: 

bi 

by 
bs 
be 
bs 
ba 
by 


Dr, + Fre 
Fr, + Dra + Fr, 
Fz4 + Day + Fr, 
Fr} + Dr, + Fr, 
Fra + Dag + Fre 
Fe, + Drg + Fn 
Fre + Dr 
(4.5.13) 
For i = 2, 4, and 6 we multiply equations i — 1, i, and i+ 1 by F, —D, and 
F, respectively, and add the resulting eguations to obtain 
(2F? — D*)z; + F?z, = F(b, + by) - Db, 
F?z; T (2r? — D*)z, T F*z, = F(b; + bs) — Db, 
F?z, + (2F? - D'*)zg - F (bs + by} — Dbs 
Thus, with this tactic we have removed the odd-indexed z; and are left 
with a reduced block tridiagonal system of the form 
pi) 
2 


DY x, + FO, 
Fz. + DOz, + Fm of) 
Fn + Dg = pf! 
where Dt!) = 2F? — D? and F(! = F? commute. Applying the same elim- 
ination strategy as above, we multiply these three equations respectively 


by FC), D, and FC), When these transformed equations are added 
together, we obtain the single equation 


(art? - DO?) z, = FO (oP + LP) — peo? 


which we write as 


8 n» 0 » it Ww i 


H 


Dx, = p, 


This completes the cyclic reduction. We now solve this (small) q-by-q sys- 
tem for x4. The vectors r4 and zs are then found by solving the systems 


DO, = Bf!) _ p, 
Dg E a) pO, 
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Finally, we use the first, third, fifth, and seventh equations in (4.5.13) to 
compute 2), 13, 23, and 27, respectively. 
For general n of the form n = 2* —1 we set D® = D, PO) = p, 5(0) =b 
and compute: 
for p= i:k—1 
FO) = [F9 
DO) = 2F) — [D0D} 


r= 2P 
for j = 1:2*-? — | (4.5.14) 
) _ pip- ip-1) (p-1) -1)40-1) 
b =i") Cer + Dunn =D? Db; 
end 
end 


The z; are then computed as follows: 


Solve D- Dx, = b for zy. 
for p= k — 2: — 1:0 
r= 2? 
for j = 1:2*-?-} (4.5.15) 
if ;7=1 
c= b je =z FP tajp 


elseif ; = gk-p+l 
= hP) 
a 0-1» - FA tajar 


else 
vs b-i - FOP) (zajr + 205-2) 
end 
Solve DP zo; yr = cfor I(25-1)r 
end 


end 


The amount of work required to perform these recursions depends greatly 
upon the sparsity of the D? and F). In the worse case when these 
matrices are full, the overall flop count has order log(n)q*. Care must be 
exercised in order to ensure stability during the reduction. For further 
details, see Buneman (1969). 


Example 4.5.1 Suppose q = 1, D = (4), and F = (-1) in (4.5.14) and that we wish to 
solve: 


4 -l Q 0 0 0 0 Ti 2 
-1 4 -1 0 0 0 0 24 4 
o -l 4 -1 0 0 0 x3 6 
0 0 -1 4 -1 Ü 0 Ta = 8 
0 0 ð —1 4 -l 0 Z5 10 
Ü 0 0 0 -l 4 -1 za 12 
0 0 0 0 0 -1 4 zT 22 
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By executing (4.5.15) we obtain the reduced systems: 
-14 1 D I1 —24 
1 -14 1 Z4 = —48 p=i 
ü i -14 Ts —80 


[ -194 ] = [z ][-me] p= 
The r; are then determined via (4.5.16): 


and 


p=? Yama 
pst r2-23 te =6 
p=: 3 =1 rs— d zg =5 ry=T 


Cyclic reduction is an example of a divide and conquer algorithm. Other 
divide and conquer procedures are discussed in §1.3.8 and §8.6. 


4.5.5 Kronecker Product Systems 
If Be R™*" and C c IR", then their Kronecker product is given by 


biG 50 e bin 

buC bC ce DanC 
A-B&O-s| . . . - 

bic bmi’ "NUT brant 


Thus, A is an m-by-n block matrix whose (i, j) block is bC. Kronecker 
products arise in conjunction with various mesh discretizations and through- 
out signal processing. Some of the more important properties that the 
Kronecker product satisfies include 


(A e B)Y(C e D) = =AC@BD (4.5.16) 
(AS B) = AT BT (4.5.17) 
(ABY! = A eB — — (4.5.18) 


where it is assumed that all the factor operations are defined. 
Related to the Kronecker product is the “vec” operation: 


X(:,1) 
XecemR"" e  we(X)- e R™, 
X(,n) 


Thus, the vec of a matrix amounts to a "stacking" of its columns. It can 
be shown that 


Y=CXB™ © ~~ vec(Y) =(B@C)vec(X). (4.5.19) 
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It follows that solving a Kronecker product system, 
(B $& C€)yz -d 


is equivalent to solving the matrix equation CX B7 = D for X where 
xg = vec( X) and d = vec(D). This has efficiency ramifications. To illustrate, 
suppose B,C € IR"*" are symmetric positive definite. If A= B@C is 
treated as a general matrix and factored in order to solve for z, then O(n*) 
flops are required since B OC € FC *"" On the other hand, the solution 
approach 


1. Compute the Cholesky factorizations B = GGT and C = HHT. 
2. Solve BZ = DT for Z using G. 
3. Solve CX = ZT for X using H. 
4. x = vec( X). 
invalves O(n?) flops. Note that 
B&C =GGT 9 HHT = (Ge HGe H)T 


is the Cholesky factorization of B & C because the Kronecker product of a 
pair of lower triangular matrices is lower triangular. Thus, the above four- 
step solution approach is a structure-exploiting, Cholesky method applied 
to B@C. 

We mention that if B is sparse, then BOC has the same sparsity at the 
block level. For example, if B is tridiagonal, then B@C is block tridiagonal. 


Problema 


P4.5.1 Show that a block diagonally dominant matrix is nonsingular. 
P4.5.2 Verify that (4.5.6) implies (4.5.7) and (4.5.8). 


P4.5.3 Suppose block cyclic reduction is applied with D given by (4.5.12) and F = -h 
What can you say about the band structure of the matrices FO) and D that arise? 


P4.5.4 Suppose A c R^*" is nonsingular and that we have solutions to the linear 
systems Az = b and Ay = g where b, g c R” are given. Show how to solve the system 


[æ allalla] 


in O(n} flops where a, B € R and h c R” are given and the matrix of coefficients A, in 
nonsingular. The advisability of going for such a quick solution is a complicated issue 
that depends upon the condition numbers of A and A, and other factors. 


P4.5.5 Verify (4.5.16}-(4.5.19). 
P4,5.7 Show how to construct the SVD of 3 @C from the SVDa of B and C. 
P4.5.8 Uf A, B, and C are matrices, then it can be shown that (49 B) SC = AG(BGC) 
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and so we just write AQ BOC for this matrix. Show how to solve the linear system 
(A BG C)z = d amuming that A, B, and C are symmetric positive definite, 
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4.6 Vandermonde Systems and the FFT 
Supposez(0:n) c R?^*!. A matrix V € RHD a+) of the form 
1 1 ee 1 


T Ti "s Ln 
= V(zo,. iig) = 
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is said to be a Vandermonde matriz. In this section, we show how the 
systems VTa = f = f(0:n) and Vz = b = b((:n) can be solved in O(n?) 
flops. The discrete Fourier transform is briefly introduced. This special and 
extremely important Vandermonde system has a a recursive biock structure 
and can be solved in O(n logn) flops. In this section, vectors and matrices 
are subscripted from 0. 


4.6.1 Polynomial Interpolation: Vía = f 


Vandermonde systems arise in many approximation and interpolation prob- 
lems. Indeed, the key to obtaining a fast Vandermonde solver is to recognize 
that solving VT a = f is equivalent to polynomial interpolation. This fol- 
lows because if VTa = f and 


pix) = Sa (4.6.1) 
jmi 


then p(z) = Fi for 1 = Un. 

Recall that if the z; are distinct then there is a unique polynomial of 
degree n that interpolates (rg, fo), ..., (£n, fa). Consequently, V is non- 
singular as long as the 2; are distinct. We assume this throughout the 
section. 

The first step in computing the a; of (4.6.1) is to calculate the Newton 
representation of the interpolating polynomial p: 


n k-1 
p(x) = 3a (Ie - s) (4.6.2) 


kw ixi 


The constants cy are divided differences and may be determined as follows: 


c(ü0:n) = /(0:n) 
for k= O:n —-1 
fori=n:~-1:k +1 (4.6.3) 


e = (e — eii)/(zi — Ti-k-1) 
end 
end 


See Conte and de Boor (1980, chapter 2). 
The next task is to generate a(0:n) from c(0:n). Define the polynomials 
Da(z),...,po(z) by the iteration 


Pn(Z) = cy 
for k=n—1:-1:0 


pe(z) = Ck + (T — Te) PK 41 lT) 
end 
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and observe that po(r) = piz). Writing 


mz) = ap apuro rap 


and equating like powers of r in the equation py = cy + (x — xy)px,1 gives 
the following recursion for the coefficients at i 


al”) ae 
= s 
for k=n—1: — 1:0 
aj) = ce — Thay 


aff! = aft Lua (etn 


end 
Consequently, the coefficients a; — aD ) can be calculated as follows: 


a(0:n) = c(0:n) 
for k-n—1:- 1:0 
for i = k:n — 1 (4.6.4) 
Oy = Qi — Ik. 
end 
end 


Combining this iteration with (4.6.3) renders the following algorithm: 


Algorithm 4.6.1 Given z(0:n) € R"*! with distinct entries and f = 
f(0:n) e E?*1, the following algorithm overwrites f with the solution a = 
a(0:n) to the Vandermonde system V (ro,...,r4)*a = f. 
for k 20mm -1 
foricn:-—1lk-cil 
f(t) = (Fi) - FG — 1)/(z(3) - z(i - k — 1) 
end 
end 
for k =n — 1: — 1:0 
for i = k:n — 1 
f() = fhi) — f(t + 1)z(&) 
end 
end 


This algorithm requires 5n?/2 flops. 


Example 4.6.1 Suppose Algorithm 4.6. : is used to solve 


i 1 1 1 
12 4 8 ar 
L 4 16 G4 iB 


186 CHAPTER 4. SPECIAL LINEAR SYSTEMS 


The first k-loop computes the Newton representation of p(x): 
p(z) = 10 + 16(r —1)--8(z — D)(z — 2) + (z — 1)(z — 2)(z — 3). 
The second k-loop computes a = (43 2 1]. from (10 16 8 1|". 


4.6.2 The System Vz = b 


Now consider the system Vz = b. To derive an efficient algorithm for this 
problem, we describe what Algorithm 4.6.1 does in matrix-vector language. 
Define the lower bidiagonal matrix Ly(a) e IR^ * U*(**U py 


L(a) = 


and the diagonal matrix Dy by 


Dk = diag( hel Eki — £0,.--1Xn — Zn-k-1). 
k+1 


With these definitions it is easy to verify from (4.6.3) that i£ f = f(0:n) 
and c = c(0:n) is the vector of divided differences then c = UT f where U 
is the upper triangular matrix defined by 


UT = Dot Ly—1(1) ++» Dg Loll). 

Similarly, from (4.6.4) we have 
a = L'e, 

where L is the unit lower triangular matrix defined by: 

LT = Lo(xo)" iia Ln-1{Zn-1)". 
Thus, a = LTUT where V-T = LTUT. In other words, Algorithm 4.6.1 
solves V7 a = f by tacitly computing the “UL” factorization of V ^!. 

Consequently, the solution to the system Vz = b is given by 


z = Vb = U(Lb) 
= (Lo(1)7 Dj! --- La 1(1)7 Dz} (Lai (zai) Lo(xo)b) 
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This observation gives rise to the following algorithm: 


Algorithm 4.6.2 Given z(0:n) € R?*! with distinct entries and b = 
b(0:n) € R'*!, the following algorithm overwrites b with the solution z = 
z(0:n) to the Vandermonde system V (ro, ...,r4)z = b. 


for k =O:n-— 1 
fori-m-Lk-c1 
b(i) = b(i) — z(k)b(i — 1) 


end 
end 
fork-zn-—L:- L0 
for i= k+l 
b(i) = a)/(x(i) — zli — k — 1)) 
end 
for i= k:n-1 
b(t) = Ki) — bfi + 1) 
end 
end 


This algorithm requires 5n7/2 flops. 


Example 4.6.2 Suppose Algorithm 4.6.2 is used to solve 


1 1 1 1 Z0 0 

1 2 3 4 Zl E -1 
F 4 9 E B | 1 

1 8 27 04 zy 35 


The first k-loop computes the vector 


0 
Ls(3)La(2)L1(1) | E - | i| | 
35 6 


The second k-loop then calculates 


0 3 
Lo(1)* D; ! LA ()T DE! LOT Dz! | E | - | 7 | 
I 


4.6.3 Stability 


Algorithms 4.6.1 and 4.6.2 are discussed and analyzed in Björck and Pereyra 
(1970). Their experience is that these algorithms frequently produce sur- 
Prisingly accurate solutions, even when V is ill-conditioned. They also 
show how to update the solution when a new coordinate pair (r&41, f541) 
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is added to the set of points to be interpolated, and how to solve confluent 
Vandermonde systems, i.e., systems involving matrices like 


SLdLS m 
A49 5 


V = V(zoziziuz3) = | 73 


EE. 
Abs - 


4.6.4 The Fast Fourier Transform 
The discrete Fourier transform (DFT) matrix of order n is defined by 


Fy =(fje) fix = 


where 
wy, = exp(—2T1/n) = cos(2m /n) — i- sin(2a/n). 


The parameter wn is an nth root of unity because wi = 1. In then = 4 
Case, We = —i and 


1 1 1 1 i 1 1 1| 

R= l wa wh we | | |1 -i -i i 
37 |1e62 e| |1-1 1-1 
1 we wh wi l id -l -i 


If z& C^, then its DFT is the vector Faz. The DFT has an extremely 
important role to play throughout applied mathematics and engineering. 
If n is highly composite, then it is possible to carry out the DFT in 
many fewer than the O(n?) flops required by conventional matrix-vector 
multiplication. To illustrate this we set n = 2' and proceed to develop 
the rudir-2 fast Fourier trunsform (FFT). The starting point is to look 
at an even-order DFT matrix when we permute its columns so that the 
even-indexed columns come first. Consider the case n = 8. Noting that 


we? = wki mod 8 we have 
1 1 1 1 1 1 1 1 
l w we u? wt ow wh w 
L wl wi wi 1 w? wf wh 
Aul! wi wi w wt wh w' wh _ 
B 1 1 wt 1 wh 1 wt]? "7" 
l u5 ow w' wf w oF of 
l w? wf w^ 1 wh wt w? 
low! wh a wt u? u^ w 
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If we define the index vector c = [0 2 4 6 1 35 7], then 


1 1 1l l 1 1 
wt w w w yw 
wt wh w? w? 
wW w w wh 


Fa(:, c) = 


jE - 
EES 


kh ode de ode e m om e 


w? wt 
wt ] —u? -wu -w? we 
VEM d. e LSU uf 


The lines through the matrix are there to help us think of F,(:,c) as a 
2-by-2 matrix with 4-by-4 blocks. Noting that w? = w? = w4 we see that 


Eye) = | BOO. 


F4 | -04FA4 
where 
1.0 0 O0 
_ Q wg 0 0 
w= 9 o we 0 
0 0 0 wu 


It follows that if z in an 8-vector, then 


Fal als z(0:2:7) 
Fi | -hFa z(1:2:7) 


Qu F4x( (0: 2: Fíz(0:::7)- 
—fr4 CFaz(2T) 1:2: T) 
Thus, by simple scalings we can obtain the 8-point DFT y = Fgz from the 
4-point DFTs yr = F4z(0:2:7) and yp = F41(1:2:7): 


Far = F(:c)yz(c) 


y(0:4) = yrt+d.+yp 
y(&T) = yr-d.*yg 
Here, 
i 
ul 
d=] “i 
ur 


and ".«" indicates vector multiplication. In general, if n = 2m, then y = 
Faz is given by 
y(O:m — 1) 
y(m:n — 1) 


yr +d. * yp 
yg —d.* yp 
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where 
d= [1, ure, um! ]7 
yr = FQzx(0:2:n — 1), 
yg = Fqaz(L2m-1) 


For n = 2* we can recur on this process until n = 1 for which Fir = z: 


function y = FFT(í(z,n) 
ifn=1 
y =T 
else 
m=n/2; w =en 
yr = FFT (z(0:2:n),m); ya = FFT(z(1:2:n), m) 


d=[1,u,---,w™-'] ;zod.*yp 
_|urtz 
[mt 


end 


This is à member of the fast Fourier transform family of algorithms. It 
has a nonrecursive implementation that is best presented in terms of & 
factorization of Fa. Indeed, it can be shown that Fa = Ar- A Pha where 


A,-19B, L=%,r=nfL 
with 


_ | ien "ua P^ L/2-1 
Br = | Ir -AL and (1r m = diag(1,wz,.. Or ). 
The matrix P, is called the bii reversal permutation, the description of 
which we omit. (Recall the definition of the Kronecker product *9" from 
84.5.5.) Note that with this factorization, y = Faz can be computed as 
follows: l 


z= Pz 

for q = 1:t 
L=¥,r=n/L (4.6.5) 
r-(l.GBri) 

end 


The matrices A, = (7. & Br) have 2 nonzeros per row and it is this sparsity 
that makes it possible to implement the DFT' in O(nlogn) flops. In fact, 
a careful implementation involves 5n log, n flops. 
The DFT matrix has the property that 
1 


Fiz IE = IF. (4.6.8) 
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That is, the inverse of Fa is obtained by conjugating its entries and scaling 
by n. A fast inverse DFT can be obtained from a (forward) FFT merely 
by replacing all root-of-unity references with their complex conjugate and 
scaling by n at the end. 

The value of the DFT is that many “hard problems" are made simple 
by transforming into Fourier space (via Fa). The sought-after solution 
is then obtained by transforming the Fourier space solution into original 
coordinates (via F7 '). 


Problems 


P4.6.1 Show that if V = V(zo,...,2n), then 
det(V) = II (zi ~ 2). 


n2i»j20 


P4.6.2 (Gautschi 1975a) Verify the following inequality for the n = 1 case above: 


lix 
| V7! læ < EN 
O£kn Iz. — zd 

iw 

iskh 
Equality results if the z; are all on the same ray in the complex plane. 
P4.6.3 Suppose w = E was wns ot | where n = 2°. Using colon notation, 
express 

h. uhr, wi, sant ut | 


as a subvector of w where r 223, g = 1:t. 
P4.6.4 Prove (4.6.6). 


P4.6.5 Expand the operation z = (I @ 8, )z in (4.6.5) into a double loop and count 
the number of flops required by your implementation. (Ignore the details of z = Paz. 
P4.8.8 Suppose n — 3m and examine 


G = [Fai Qn -1}) Fa(l:km — 1) Fa, 2:3:n — 1)] 


as a 3-by-3 block matrix, looking for scaled copies of Fin. Based on what you find, 
develop a recursive radix-3 FFT analogous to the radix-2 implementation in the text. 


Notes and References for Sec. 4.8 
Our discussion of Vandermonde linser systems is drawn from the papers 


A. Björck and V. Pereyra (1970). “Solution of Vandermonde Systems of Equations,” Math. 
Comp. 24, 893-903. 

A. Bjórck and T. Elfving (1973). "Algorithms for Confluent Vandermonde Systems,” 
Numer. Math. 21, 130-37. 


The divided difference computations we discussed are detailed in chapter 2 of 


S.D. Conte and C. de Boor (1980). Elementary Numerical Analysis: An Algorithmic 
Approach, 3rd ed., McGraw-Hill, New York. 
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The latter reference includes an Algol procedure. Error analyses of Vandermonde system 
solvers include 


N.J. Higham (1987b). "Error Analysis of the Bjürck-Pereyra Algorithms for Solving 
Vandermonde Systems," Numer. Math. 50, 613—632. 

N.J. Higham (1988s). "Fast Solution of Vandermonde-like Systems Involving Orthogonal 
Polynomials,” IMA J. Num. Anal 8, 473-484. 

N.J. Higham (1990). "Stability Analywia of Algorithms for Solving Confluent Vandermonde- 
like Systems,” SIAM J. Matriz Anal. Appi. 11, 23-41. 

S.C. Bartels and D.J. Higham (1992). “The Structured Sensitivity of Vandermonde-Like 
Systema, Numer. Math. 68, 17-34. 

J.M. Varah (1993). “Errors and Perturbations in Vandermonde Systems,” {MA J. Num. 
Anal 13, 1-12. 


Interesting theoretical results concerning the condition of Vandermonde systems may be 
found in 


W. Gautachi (19752). “Norm Estimates for Inverses of Vandermonde Matrices,” Numer. 
Mats. 23, 337-47. 

W. Gautachi (1975b). “Optimally Conditioned Vandermonde Matrices,” Numer. Math. 
24, 1-12. 


The basic algorithms presented can be extended to cover confluent Vandermonde sys- 
terng, block Vandermonde systems, and Vandermonde systems that are based on other 
polynomial bases: 


G. Galimberti and V. Pereyra (1970). “Numerical Differentiation and the Solution of 
Multidimensional Vandermonde Systems,” Math. Comp. 24, 351—654. 

G. Galimberti and V. Pereyra (1971). “Solving Confluent Vandermonde Systems of 
Hermitian Type,” Numer. Math. 18, 44-60. 

H. Van de Vel (1977). “Numerical Treatment of a Generalized Vandermonde systems of 
Equations,” Lin. Alg. and its Applic. 17, 149-74. 

G.H. Golub and W.P Tang (1981). “The Biock Decomposition of a Vandermonde Matrix 
and Its Applications,” BIT 21, 505-17. 

D. Calvetti and L. Reichel (1992). “A Chebychev-Vandermonde Solver,” Lin. Alg. and 
Its Appice. 172, 219-229. 

D. Calvetti and L. Reichel (1993). “Fast Inversion of Vandermonde-Like Matrices In- 
volving Orthogonal Polynominia,” BIT 33, 473-484. 

H. Lu (1904). "Fast Solution of Confluent Vandermonde Linear Systems,” SIAM J. 
Matriz Anal. Appi 15, 1277—1288. 

H. Lu (1996). “Solution of Vandermonde-like Systeme and Confluent Vandermonde-lika 
Systema,” STAM J. Matriz Anal. Appl. 17, 127—138. 


The FFT literature is very extensive and acattered. For an overview of the area couched 
in Kronecker product notation, see 


C.F, Van Loan (1992). Computational Frameworks for the Fast Fourier Transform, 
SIAM Publications, Philadelphia, PA. 


The point of view in this text iz that different FFTs correspond to different factorizations 
of the DFT matrix. These are sparse factorizations in that the factors have very few 
nonzeros per row. 
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4.7 Toeplitz and Related Systems 


Matrices whose entries are constant along each diagonal arise in many ap- 
plications and are called Toeplitz matrices. Formally, T € R'*" is Toeplitz 
if there exist scalars r 4&41,..., T0, ...,Tn-1i Such that ay = rj, for all i 
and j. Thus, 


T 1 T2 fy 3 17 6 
T = T.) T9 Tj T4 NH 431 7 
7 T-2 Tot To ri 7 0 43 1 
T-3 f-3 T.1 79 9 04 3 
is Toeplitz. 


Toeplitz matrices belong to the larger class of persymmetric matrices. 
We say that B c R°*" is persymmetric if it symmetric about its northeast- 
southwest diagonal, i.e., bij; = by—j41,,-:41 for alli and j. This is equivalent 
to requiring B = EBT E where E = [e,,...,€1] = In(:,n: — 1:1) is the 
n-by-n exchange matriz, i.e., 


0001 
0010 
F^lo:00 
1000 


It is easy to verify that (a) Toeplitz matrices are persymmetric and (b) the 
inverse of a nonsingular Toeplitz matrix is persymmetric. In this section we 
show how the careful exploitation of (b) enabies us to solve Toeplitz systems 
with O(n?) flops. The discussion focuses on the important case when T is 
also symmetric and positive definite. Unsymmetric Toeplitz systems and 
connections with circulant matrices and the discrete Fourier transform are 
briefly discussed. 


4.7.1 Three Problems 
Assume that we have scalars r,,...,r, such that for k = 1:n the matrices 


l Tr  c' Teo Tk-1 
T1 1 E Tk—2 
Tk = ` 
Tk-2 E: ise Ti 
Fk-1 Tk—-23 cc 701 1 


are positive definite. (There is no loss of generality in normalizing the 
diagonal.) Three algorithms are described in this section: 
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e Durbin's algorithm for the Yule- Walker problem Tay = —[ri, ..., r4]. 
e Levinson's algorithm for the general righthand side problem Tar = b. 
e Trench’s algorithm for computing B = 1, !. 
In deriving these methods, we denote the k-by-k exchange matrix by E, 
ie, Ey = Tels, k: — 1:1). 


4.7.20 Solving the Yule- Walker Equations 


We begin by presenting Durbin's algorithm for the Yule- Walker equations 
which arise in conjunction with certain linear prediction problems. Suppose 
for some k that satisfies 1 < k < n — 1 we have solved the k-th order Yule- 
Walker system Ty = —r = —(ri,...,r4)T. We now show how the (k+1)-st 
order Yule- Walker system 


Tk Ekr z ee r 
rT Ey 1 a 7 k+l 
can be solved in O(k) flops. First observe that 


I= T, !(-r - aE,r) =y- aT, ! Exr 


and 
Q = —Tk44 — r7 Ez. 


Since Tz ! is persymmetric, T; ! E, = ET, and thus, 
z=y- GELT, 'r = y +aEky. 
By substituting this into the above expression for a we find 
a = -rkp — 0T Ek(y + okay) = —(rexi + r7 Erky)/(1 4 77 y). 


The denominator is positive because T),4. is positive definite and because 


I Eg] [ T Er][I Ey] [T5 0 
0 1 TTE, 1 0 1 ~| O 1-rTy 


We have illustrated the kth step of an algorithm proposed by Durbin (1960). 
It proceeds by solving the Yule- Walker systems 


Tey) = r = — [n,... n] 


for k = 1:n as follows: 
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y = en 
fork = k::n-1 
By = Y [r9], 09 
T 
a = (reg tr ELy(9)/8, (4.7.1) 


(k+1) _ 2 


y Ok 


end 


As it stands, this algorithm would require 3n? flops to generate y = y™. 
It is possible, however, to reduce the amount of work even further by ex- 
ploiting some of the above expressions: 


f 1+ [rn 
(k-14) (k-1) 
1+ l p&- 07 i | | y t ayk-iEk-1y | 


H 


Ok-1 


lt 


(1 pM Tye) + osa (p 777 Ery” er) 


Bk-1 + ek-i(7 P101) 
(1— a 1) ki. 


Using this recursion we obtain the following algorithm: 


Algorithm 4.7.1. (Durbin) Given real numbers 1 = 79,7,,...,74 such 
that T = (r.;]) € KC ^ is positive definite, the following algorithm com- 
putes y € IR? such that Ty = —(ri,...,r4)?. 
y(1) = —r(15; 6=1; a -r(1) 
for k — I::)n-1 
B-(1-a?) 
a = — (r(k-- 1) + r(k: - 1:1)? y(1:k)) /8 
z(1:&) = y(1:k) + ay(k: — 1:1) 
y(1:k +1) = | i 


end 


This algorithm requires 2n? flops. We have included an auxiliary vector z 
for clarity, but it can be avoided. 


Example 4.7.1 Suppose we wish to solve tha Yule- Walker system 


1.8 2 yı 5 
5 1 = y | =—F .2 
2 5 1 vs J 


using Algorithm 4.7.1. After one paas through the loop we obtain 


a=i/15, @#=3/4, y= | prs | i 
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We then compute 

(1 — a7) 8 = 56/75 

~{r3 + fay + riya)/A = —1/28 
yı + aya = —-225/420 

yi + oy, = —36/420, 


R 
Hog 


he 
= 
t 


giving the final elution y = [-75, 12, —5|7/140. 


4.7.3 The General Right Hand Side Problem 


With a little extra work, it is possible to solve a symmetric positive definite 
Toeplitz system that has an arbitrary right-hand side. Suppose that we 
have solved the system 


Tyr = b = (bp... bp)! (4.7.2) 


for some k satisfying 1 < k < n and that we now wish to solve 


| FN | | l | = | ae | | (4.7.3) 


Here, r = (ri,...,7%)7 as above. Assume also that the solution to the kth 
order Yule- Walker system T,y = —r is also available. From Tyv--uE,r = b 
it follows that 


v = Tg (b- per) = z - I, Ex = z + AEXy 
and so 


Bom bki- r Ekv 
big — 7T Eyz — pry 
(bk. ~ rT Eyx) / (1 t rTy) ; 


Consequently, we can effect the transition from (4.7.2) to (4.7.3) in O(k) 
flops. 

Overall, we can efficiently solve the system Tng = b by solving the sys- 
tems Tr) = b% = (b,,..., bk)? and Thy) = —r(9 = (ry,..., 74)" "in 
parallel” for k = 1:n. This is the gist of the following algorithm: 


Algorithm 4.7.2 (Levinson) Given bc R” and real numbers 1 = 
TooFiy---yfn such that T = (r-i) € R'"" is positive definite, the fol- 
lowing algorithm computes r € IR" such that Tx = b. 


y(1) = —r(1); z(1) = bk(1; 8 = 1; a = —r(1) 
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for k = lin —1 
B= (1— a?)8; p= (bk 1) — r(L:k)Tz(k: — 1:1)) /8 
v(1:k) = z(1:k) + uy(k: — 1:1) 


z(1:k +1) = | i | 


ifk «n-1 
a = (—r(k +1) e r(Lk)Ty(k: — 1:11) /8 
z(1:k) = y(1:k) + ay(K: — 1:1) 


y(l:k +1) = | AES | 


end 
end 


This algorithm requires 4n? flops. The vectors z and v are for clarity and 
can be avoided in a detailed implementation. 


Example 4.7.2 Suppose we wish to solve the symmetric positive definite Toeplitz 


buc 2 E I 4 
5 1 5 zy | =-] -l 
2 5 1 13 3 


using the above algorithm. After one pass through the loop we obtain 


a=1/15, §=3/4, v= {|e | -=[ i ]- 


We then compute 


B = (l-a@7)§=56/T5 p = (bay —riza — raz1)/D = 285/56 
u c fot yy 355/56 — và = 29 + ay = —376/56 


giving the final solution x = [355, —376, 285|7 /56. 


4.7.4 Computing the Inverse 


One of the most surprising properties of a symmetric positive definite 
Toeplitz matrix Tn is that its complete inverse can be calculated in O(n?) 
flops. To derive the algorithm for doing this, partition 7, ! as follows 


a [A Er] [B v 
ma[ AFT -[2 2] cra 
where A= T, ,, E = En-1, and r = (rj,...,fa-1)". From the equation 
A Er v| i10 
rE 1 | [l1 


it follows that Av = —yEr = —7E(rj,...,fa-1)? and y = 1 — rT Ev. 
If y solves the (n — 1)-st order Yule- Walker system Ay = —r, then these 
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expressions imply that 


y = M ry) 
y = yy. 


Thus, the last row and column of T; ! are readily obtained. 
It remains for us to develop working formulae for the entries of the 
submatrix B in (4.7.4). Since AB + Erv? = J,_,, it follows that 


T 
B - A7! - (ACE EDKT AC — 


Now since A = T,,_; is nonsingular and Toeplitz, its inverse is persymmet- 
ric. Thus, 


= vit; 
b; = (AU) - 
aa l , Un=jUn=i vt 
a =j nei een SEN 
: 7 Y 


1 
= bn —j,n-i + > Win; — Un jUn-i) ; 


This indicates that although B is not persymmetric, we can readily compute 
an element bi; from its reflection across the northeast-southwest axis. Cou- 
pling this with the fact that A^! is persymmetric enables us to determine 
B from its “edges” to ita “interior.” 

Because the order of operations is rather cumbersome to describe, we 
preview the formal specification of the algorithm pictorially. To this end, 
assume that we know the last column and row of T; !: 


u u u u u k 

u u u u u k 

- l1" wuuuk 
T. = u u u u u k 
u u uu u u k 

k kk kkk 


Here u and k denote the unknown and the known entries respectively, and 
n = 6. Alternately exploiting the persymmetry of T; and the recursion 
(4.7.5), we can compute B, the leading (n — 1)-by-(n — 1) block of T7}, as 
follows: 


k k k k k k kk k k k k 
k uuu uk k u uukk 
persym. k u u u u k (4.7.5) k u u ukk 
k u u u u k k u u u k k 
k u u u u k k k k k k k 
k k k k k k k k k kkk 
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kk kk k k k k k k kk 
k kk k k k k k k k kk 
persym | k k u u k kj(T9| E k u k k k 
k k uu kk kk k k k k 
k k k k k k kkk kk k 
kkk k k k kk kk k k 
kkk kk &k 
kek k kk 
peraym. | kk k k k Kk 
C^ |k k k k k k 
kkk kk k 
kk k k kk 


Of course, when computing a matrix that is both symmetric and persym- 
metric, such as T~', it is only necessary to compute the “upper wedge” of 
the matrix—e.g., 


x 

x x (n = 6) 

x 

With this last observation, we are ready to present the overall algorithm. 


Algorithm 4.7.3 (Trench) Given real numbers 1 = rg,1r1,-..,7n such 
that T = (rti) € R'*" is positive definite, the following algorithm com- 
putes B = T. . Only those b; for which i € 7 and i+j < n- 1 are 
computed. 


Use Algorithm 4.7.1 to solve Ta-1y = —(ri,.. r8 -1)7. 
y = M(1-r rin — 1)? y(1:n — 1)) 
v(1:n — 1) —yy(n — 1: — 1:1) 
B(1,1) 53 
B(1,2:n) = v(n — 1: - 1:1)" 
for i = 2:floor((n — 1)/2) - 1 

for j=tn~i+1 

B(1,7) m B(i =j =) 
(v(n +1 —j}o(n + 1-4) - v(i - 1)v(3 — 1) /+ 

end 

end 


This algorithm requires 13n7/4 flops. 


Example 4.7.3 If the above algorithm is applied to compute the inverse B of the 
positive definite Toeplitz matrix 

I. 5 2 

5 Ll, 

2 35 1 
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then we obtain y = 75/56, bà1 = 75/56, big = —5/7, b13 = 5/56, and bz = 12/7. 


4.7.5 Stability Issues 


Error analyses for the above algorithms have been performed by Cybenko 
(1978), and we briefly report on some of his findings. 
The key quantities turn out to be the a, in (4.7.1). In exact arithmetic 
these scalars satisfy 
lak| <1 


and can be used to bound || T7! |J,: 


n-i ' n=l 1 4 itle | 
max I[a-e3 [[ü-2) «Idus L TE T-lad (4.7.6) 
j=l j=l 


Moreover, the solution to the Yule- Walker system Thy = —r(1:n)} satisfies 


lvi = (Ila +a) -i (4.7.7) 
k=} 


provided all the a, are non-negative. 
Now if z is the computed Durbin solution to the Yule-Walker equations 
then rp = Taf + r can be bounded as follows 


l ro e wI[G- iâ) 


kml 


where G, is the computed version of a,. By way of comparison, since 
each |r;| is bounded by unity, it follows that || rc || = ull y |, where rc is 
the residual associated with the computed solution obtained via Cholesky. 
Note that the two residuals are of comparable magnitude provided (4.7.7) 
holds. Experimental evidence suggests thnt this is the case even if some of 
the o, are negative. Similar comments apply to the numerical behavior of 
the Levinson algorithm. 

For the Trench method, the computed inverse Ê of T7! can be shown 
to satisfy 

- B 1+ |& 
a li aTT + | td l 
E Tn li kel 


In light of (4.7.7) we see that the right-hand side is an approximate upper 
bound for u|| 77, ! || which is approximately the size of the relative error 
when 7;.! is calculated using the Cholesky factorization. 
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4.7.6 The Unsymmetric Case 


Similar recursions can be developed for the unsymmetric case. Suppose we 
are given scalars r1,...,fn—1) P1,+++;Pn—1, and b),...,4, and that we want 
to solve a linear system Tx = b of the form 


1 Tr T2 Fs T4 T1 by 
poli n rm rs r2 by 
p fy lnn zz | = | by (n = 5). 
B P2 DA 1 ñ Z4 4 
D P mn 1 T5 by 


In the process that follows we require the leading principle submatrices 
Ty = T(1:k, 1:4}, k = lin to be nonsingular. Using the same notation as 
above, it can shown that if we have the solutions to the k-by-k systems 


Try = -r = -[nra-e nl 
Tw = -p = -[Imp-- RAI (4.7.8) 
Tz = b = [by bg +: byl’, 


then we can obtain solutions to 
Tk Egr E z 
p E, 1 a 
ES 1 | M T Fa | ) 


lfm Tle) * Dl 

rE, 1 A bei 

in O(K) flops. This means that in principle it is possible to solve an unsym- 
metric Toeplitz system in O(n?) flops. However, the stability of the process 
cannot be assured unless the matrices T, = T(1:k, 1:k) are sufficiently well 
conditioned. 


Il 
| 
ot 
= 
T. 
Lond 
| 


4.7.7 Circulant Systems 


A very important class of Toeplitz matrices are the circulant matrices. Here 
is an example: 


Up Va U3 Ve Ui 
Uj Up U4 U3 ty 
C(v)-| v2 "v; vo u v 
Uu U3 UM Ug u 
Vea Ug T2 A Uo 
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Notice that each column of a circulant is a "downshifted" version of its 
predecessor. In particular, if we define the downshift permutation S4 by 


002001 
10000 

S.=|0 106 06 6 (n = 5) 
00100 
00010 


and v = [vg v1 :-- v4-1 ea then C(v) = [ v. Snr, SE Ol 55-1, |. 
There are important connections between circulant matrices, Toeplitz 
matrices, and the DFT. First of all, it can be shown that 
C(v) = FT disg( FAv) F,,. (4.7.10) 


This means that a product of the form y = C(v)z can be solved at “FFT 
speed”: 


Z= Fiz 
v= Fau 
z=U.*2 
y= Fz 


In other words, three DFTs and a vector multiply suffice to carry out the 
product of a circulant matrix and a vector. Products of this form are called 
convolutions and they are ubiquitous in signal processing and other areas. 

Toeplitz-vector products can also be computed fast. The key idea is 
that any Toeplitz matrix can be “embedded” in a circulant. For example, 


In general, if T = (¢:;) is an n-by-n Toeplitz matrix, then T = C(l:n, L:n) 
where C € IR?n-*O^-?) is a circulant with 


xL | TQ, c yt | | 


Note that if y = Cz and z(n-- 1:2n— 1) = 0, then y(1:n) = T'z(1:n) showing 
that Toeplitz vector products can also be computed at “FFT speed.” 
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Problems 


P4.7.1 For any v € R” define the vectors v} = (v + E4v)/2 and v- = (v — E,v)/2. 
Suppose A € E? *" is symmetric and persymmetric. Show that if Az = b then Ar, = b4 
and Az- = 6_. 

P4.7.2 Let U € R^** be the unit upper triangular matrix with the property that: 
U(1:k — 1,k) = E, 1 y 57? where y is defined by (4.7.1). Show that 


UTTAU = dieg(1, f, ' a 71): 


P4.7.3 Suppose z € R” and that 5 € EC *" is orthogonal. Show that if 
X = [s. Sz, iustis) 


then XT X is Toeplitz. 

P4.T.4 Consider the LDLT factorization of an n-by-n symmetric, tridiagonal, positive 
definite Toeplitz matri. Show that dn and fn n-1 converge as n — oo. 

P4.7.5 Show that the product of two lower triangular Toeplitz matrices is Toeplitz. 
P4.7.6 Give an algorithm for determining p € R such that 


Tn + u (enei + eve) 


is singular. Assume Ts = (rj. |) is positive definite, with ro = 1. 
P4.7.7 Rewrite Algorithm 4.7.2 so that it does not require the vectors z and v. 
P4.7.8 Give an algorithm for computing zoo (Tk) fot k = l:n. 
P4.7.9 Suppose A1, Az, A3 and A4 are m-by-m matrices and that 

Ag At A Aa 

A2| ^3 Ao A A^ 

Az As Ao Al 

Ai Aq Ay Ag 
Show that there is a permutation matrix II such that IIT AIT = C = (C,;) where each 
Gj is a 4-by-4 circulant matrix. 
P4.7.10 A p-by-p block matrix A = (A,;) with m-by-m blocka is block Toeplitz if there 
exist A_si2,...,4-1,A0,A1,---,Apo1 € R™*™ so that Ajj = Ajj, e. 
Ag A Aq Ay 
A-1 Ag Ai Az 
Anz A-1 Ao Al 
A-3 A-1 A-1 Ao 


(a) Show that there is à permutation I] such that 


A= 


Ti Tia c Tim 
nTAm =:| 74 Ta 
Tai aas Tram 


where each T;; is p-by-p and Toeplitz. Each T,; should be “made up" of (i,j) entries 
selected from the A, matrices. (b) What cau you say about the Ti; if A, = A-k, 
k=l:p-1? 

P4.7.11 Show how to compute the solutions to the systema in (4.7.9) given that the 
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solutions to the systems in (4.7.8) are available. Aemime that ail the matrices involved 
are nonsingular. Proceed to develop a fast unsymmetric Toeplitz solver for Tx = b 
asmuming that 77s leading principle submatrices are all nonsingular. 

P4.7.12 A matrix H ç R'*"^ is Hankel if H(n: — 1:1,:) is Toeplitz. Show that if 
A c H^ *" is defined by 


è 
aj = f cos(k8) cos(j8)d8 
4 
then A is the eum of a Hankel matrix and Toeplitz matrix. Hint. Make use of the 
identity cos(u + v) = cos(u) cos(v) — sin(u) sin(v). 
P4.7.13 Verify that F,C(v) = diag( Fav) Fn. 
P 4.7.14 Show that it is possible to embed a symmetric Toeplitz matrix into a symmetric 
P4.7.15 Consider the kth order Yule-Walker system T,y‘*? = —-r‘*) that arises in 
{4.7.1): 


Yki ri 

Tk ; = -= ; 

Ukk Tk 

Show that if 
0 0 0 Q 
yii 1 0 D D 
vaa yai l 0 ü 
L= V33 ysa yn 1 0 f, 

Yn-in-1 Yn-ln-3 VWa-Lln-3 ` Bn-11 1 


then LT,LT = disg(1,/h....,. n1) where By = 1-- r7 yUO, Thus, the Durbin 
algorithm can be thought of as a fast method for computing the LDLT factorization of 
7. 
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unstable Toeplitz techniques abound and caution must be exercised. See also 


G. Cybenko (1978). "Error Analysis of Some Signal Processing Algorithma," Ph.D. 
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Chapter 5 


Orthogonalization and 
Least Squares 


§5.1 Householder and Givens Matrices 

$5.2 The QR Factorization 

$5.3 The Full Rank LS Probiem 

905.4 Other Orthogonal Factorizations 

$5.5 The Rank Deficient LS Problem 

85.6 Weighting and Iterative Improvement 
§5.7 Square and Underdetermined Systems 


This chapter is primarily concerned with the least squares solution of 
overdetermined systems of equations, i.e., the minimization of || Ax — 6 ||. 
where A € R™*" with m > n and b c R™. The most reliable solution pro- 
cedures for this problem involve the reduction of A to various canonical 
forms via orthogonal transformations. Householder reflections and Givens 
rotations are central to this process and we begin the chapter with a discus- 
sion of these important transformations. In $5.2 we discuss the computation 
of the factorization A = QR where Q is orthogonal and R is upper trian- 
gular. This amounts to finding an orthonormal basis for ran( A). The QR 
factorization can be used to solve the full rank least squares problem as we 
show in §5.3. The technique is compared with the method of normal equa- 
tions after a perturbation theory is developed. In §5.4 and §5.5 we consider 
methods for handling the difficult situation when A is rank deficient (or 
nearly so). QR with column pivoting and the SVD are featured. In §5.6 we 
discuss several steps that can be taken to improve the quality of a computed 


MÉ 
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least squares solution. Some remarks about square and underdetermined 
systems are offered in $5.7. 


Before You Begin 


Chapters 1, 2, and 3 and §§4.1-4.3 are assumed. Within this chapter 
there are the following dependencies: 


85.6 


T 
§5.1 — §5.2 — 853 — §54 — 555 


l 
§5.7 


Complementary references include Lawson and Hanson (1974), Farebrother 
(1987), and Björck (1996). See also Stewart (1973), , Hager (1988), Stewart 
and Sun (1990), Watkins (1991), Gill, Murray, and Wright (1991), Higham 
(1996), Trefethen and Bau (1996), and Demme! (1996). Some MATLAB 
functions important to this chapter are qr, svd, pinv, orth, rank, and the 
“backslash” operator “\.” LAPACK connections include 


LAPACK: Househoider / Givens Toois 


Householder times matrix 

Small n Householder times matrix 

Block Householder times matrix 

Computes J — V TV block reflector representation 
Generates a plane rotation 

Generates a vector of plane rotations 

Applies a vector of plane rotations to & vector pair 
Applies rotation sequence to & matrix 

Real rotation times complex vector pair 

Complex rotation (c real) times complex vector pair 
Complex rotation (B real) times complex vector pair 


A=QR 

AI z QR 

Q (factored form) times matrix (real case) 

Q (factored form) times matrix (complex case) 


iC pper triangular}(orthogo 
A= QL = (orthogonal) (lower triangular) 
Az LQ = (lower triangular)(orthogonal) 
A = RQ where A is upper trapezoidal 


Bidisgoualization of band matrix 
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F 
GELSS EOD aiara Gia Fae B\lp 


.CELSI Complete orthogonal decomposition eolution to min |] AX — B || p. 
B Equilibrates general 


5.1  Householder and Givens Matrices 


Recall that Q € IR"*" is orthogonal if QTQ = QQ? = In. Orthogonal 
matrices have an important role to play in least squares and eigenvalue 
computations. In this section we introduce the key players in this game: 
Householder reflections and Givens rotations. 


5.1.1 <A 2-by-2 Preview 


It is instructive to examine the geometry associated with rotations and 
reflections at the n = 2 level. A 2-by-2 orthogonal matrix Q is a rotation: 
if it has the form 
_ | cos(#) sin(8) 
IE oe 
If y = QT x, then y is obtained by rotating x counterclockwise through an 


angle 8. 
A 2-by-2 orthogonal matrix Q is a reflection if it has the form 


_ | eos(8)  sin(6) 
Q = | sin(€) —cos(P) 


If y = Q? x = Qr, then y is obtained by reflecting the vector x acroes the 


line defined by 
NE 


Reflections and rotations are computationally attractive because they are 
easily constructed and because they can be used to introduce zeros in a 
vector by properly choosing the rotation angle or the reflection plane. 


Example 5.1.1 Suppose x =[1, V/3]T. If we set 


gai zm) mee 1/2 -V3/2 
—sin(-60°) cow(-609) Vi um 


then QT z = (2, 0|T. Thus, a rotation of —60? zeros the second component of x. [f 


Q- rere sin(307) | 4] v3 1/2 
sin(30?^)  — cos(30?) 1/2 -V3/2 


then QTx = [2, 0]T. Thus, by reflecting x across the 30° line we can zero its second 
component. 
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5.1.2 Householder Reflections 
Let v € IR" be nonzero. An n-by-n matrix P of the form 


P2I- -poo (5.1.1) 
is called a Householder reflection. (Synonyms: Householder matrix, House- 
holder transformation.) The vector v is called a Householder vector. If a 
vector x is multiplied by P, then it is reflected in the hyperplane span{v}+. 
It is easy to verify that Householder matrices are symmetric and orthogonal. 

Householder reflections are similar in two ways to Gauss transforma- 
tions, which we introduced in $3.2.1. They are rank-1 modifications of the 
identity and they can be used to zero selected components of a vector. In 
particular, suppose we are given 0 # r € IR" and want Pr to be a multiple 
of ey = I4(:, 1). Note that 


T T 
Ps = (r- & DIE 
viv 


and Px € span{e,} imply v € span(z, ei). Setting v = z + aei gives 


vir-zlr4 QT, 


and 


vty = zT r-2az;- a, 


and therefore 


zit + az) vir 


idm (iau Een i) s nn 


In order for the coefficient of + to be zero, we set a = +]| z ||; for then 
our 
v-rcl|zlase!- Pr= (1 — M) x = zie: (5.1.2) 


It is this simple determination of v that makes the Householder reflection 
so useful. 


Example $.1.2 If z — (3, 1, 5, 1|T and v = (9, 1, 5, 1|", them 


a —27 -9 -45 -9 

vv 1 -9 53 -5 -1 

Paf TT. = 54 -45 -5 0g -5 
-ð -1 -5 53 


has the property that Pr = [—6, 0, 0, 0, JT. 
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5.1.3 Computing the Householder Vector 


There are a number of important practical details associated with the deter- 
mination of a Householder matrix, i.e., the determination of a Householder 
vector. One concerns the choice of sign in the definition of v in (5.1.2). 
Setting 
v = zi —| z fle 

has the nice property that Px is a positive multiple of e;. But this recipe is 
dangerous if z is close to a positive multiple of e; because severe cancellation 
would occur. However, the formula 

z? — ja ij —(z3+---+23 

Uu = zı -|| z ll; = zi ove he = Zit +25) 

zı +| zil zı + liz l 
suggested by Parlett (1971) does not suffer from this defect in the z4 > 0 
case. 

In practice, it is handy to normalize the Householder vector so that 
v(1} = 1. This permits the storage of v(2:n) where the zeros have been 
introduced in z, ie, r(2: n) We refer to v(2:n) as the essential part of 
the Householder vector. Recalling that 8 = 2/vT v and letting length(z) 
specify vector dimension, we obtain the following encapsulation: 


Algorithm 5.1.1 (Householder Vector) Given z € R”, this function 
computes v € R” with v(1) = 1 and 8 € IR such that P = I, — Gov" is 
orthogonal and Pr = || z ||;ei. 
function: [v, 2] = house(z) 
n = length(z) 
c = z(2:n)T z(2:n) 


1 
=| z(2:n) 
if c =0 
B0 
else 
p= Vz(1)-o 
if z(1) <= 0 
v(1) = 2(1) — & 
else 
v(1) = —e/(x(1) + u) 
end 
B = 2u(1)*/(o + v(1)*) 
v — v/v(1) 
end 


This algorithm involves about 3n flops and renders a computed Householder 
matrix that is orthogonal to machine precision, a concept discussed below. 
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A production version of Algorithm 5.1.1 may involve a preliminary scaling 
of the x vector (r — z/|} z ||) to avoid overflow. 


5.1.4 Applying Householder Matrices 


It is critical to exploit structure when applying a Householder reflection to 
a matrix. If A c R™*" and P = I — Buy? c R™™, then 


PA - (I - Buv?) A= A- ww? 
where w = GAT y. Likewise, if P = I — GuvT € R"*", then 
AP = A(I - Buu") = A- wT 


where w = Av. Thus, an m-by-n Householder update involves a matrix- 
vector multiplication and an outer product update. It requires 4mn flops. 
Failure to recognize this and to treat P as a general matrix increases work 
by an order of magnitude. Householder updates never entail the explicit 
formation of the Householder matriz. 

Both of the above Householder updates can be implemented in & way 
that exploits the fact that v(1) = 1. This feature can be important in the 
computation of PA when m is small and in the computation of AP when 
n is small. 

Às an example of a Householder matrix update, suppose we want to 
overwrite A € IR™“" (m > n) with B = QT A where Q is an orthogonal 
matrix chosen so that B(j + 1:m, j) = 0 for some j that satisfies 1 € j < n. 
In addition, suppose À(j:m,1:j — 1) = 0 and that we want to store the 
essential part of the Householder vector in A(j + 1:m, j). The following 
instructions accomplish this task: 


[v, 8] = house(A(j:m, j)) 
AGM, jin) = (Im-j41 = BuvvT ) A(j:m, j:n) 
A(j + 1: n, j) = v(2:m — j + 1) 


From the computational point of view, we have applied an order m — j +1 
Householder matrix to the bottom m — j + 1 rows of A. However, mathe- 
matically we have also applied the m-by-m Householder matrix 


5 | dyer Of _ _ Anat . | 0 
Pa [7G Rte, so] 


to A in its entirety. Regardless, the “essential” part of the Householder 
vector can be recorded in the zeroed portion of A. 
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5.1.5 Roundoff Properties 


The roundoff properties associated with Householder matrices are very fa- 
vorable. Wilkinson (1965, pp. 152-62) shows that house produces a House- 
holder vector ô very near the exact v. If P = J — 2067 /670 then 


| P — P lla = O(u) 


meaning that P is orthogonal to machine precision. Moreover, the com- 
puted updates with P are close to the exact updates with P : 


f(PA) 


ii 


P(A+E)  |Ella— O(uli A lla) 


JAP) (A+E)P Elle = O(ul Alla) 


5.1.6 Factored Form Representation 


Many Householder based factorization algorithms that are presented in the 
following sections compute products of Householder matrices 


Q = QQ Qe Qj-2l- Byuh DT (5.1.3) 
where r < n and each vU! has the form 


v0) = (0, 0,...0, 197 
a aa 


jT 
j*1 saa , vO!) : 


j-1 
It is usually not necessary to compute Q explicitly even if it is involved in 


subsequent calculations. For example, if C & IR""? and we wish to compute 
QTC , then we merely execute the loop 


for 7 = Lr 
C =Q;C 
end 


The storage of the Householder vectors vl?) .. . v) and the corresponding 
f, (if convenient) amounts to a factored form representation of Q. To 
illustrate the economies of the factored form representation, suppose that 
we have an array A and that A(j + L:n, j) houses vU) (5 + 1:n), the essential 
part of the jth Householder vector. The overwriting of C € IR^ ** with 
QTC can then be implemented as follows: 


for j = i:r 
v(j:n) = | AG m | (5.1.4) 
C(j:n,:) = (I — Bjv(:n)v(:n)T)C n, :) 


end 
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This involves about 2qr(2n — r) flops. If Q is explicitly represented as an 
n-by-n matrix, QTC would involve 2n?q flops. 

Of course, in some applications, it is necessary to explicitly form Q 
(or parts of it). Two possible algorithms for computing the Householder 
product matrix Q in (5.1.3) are forward accumulation, 


Q= 

for 7 = Lr 
Q=QQ; 

end 


and backward accumulation, 


Q= Ta 

for j =7:- 1:1 
Q-Q;Q 

eud 


Recall that the leading (j — 1)-by-(j ~ 1) portion of Q; is the identity. Thus, 
at the beginning of backward accumulation, Q is “mostly the identity" and 
it gradually becomes full as the iteration progresses. This pattern can be 
exploited to reduce the number of required flops. In contrast, Q is full 
in forward accumulation after the first step. For this reason, backward 
accumulation is cheaper and the strategy of choice: 


; l 
v(j:n) = l AG i lin, j) | 
Q(j:n, jm) = (I — Bjv(3:n)v(:n)!)QU:n, jin) 


(5.1.5) 


end 


This involves about 4(n?r -- nr? + r?/3) flops. 


5.1.7 A Block Representation 


Suppose Q = Qi ---Q, is a product of n-by-n Householder matrices as 
in (5.1.3). Since each Q; is a rank-one modification of the identity, it 
follows from the structure of the Householder vectors that Q is a rank-r 
modification of the identity and can be written in the form 


Q-I4-WYT (5.1.6) 


where W and Y are n-by-r matrices. The key to computing the bock 
representation (5.1.6) is the following lemma. 
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Lemma 5.1.1 Suppose Q = I1+WYT is an n-by-n orthogonal matriz with 
WY e RY. If P=I—- fuvu? with v € R" and z = -ffQv, then 
Qy = QP = I+ WYF 
where Wy. = |W z] and Y} = [Y v] are each n-by-(j + 1). 
Proof. 
QP 


H 


(1+ WYT) (1 — Bue?) = 14+ WYT — pQr? 
I-WYT zT -Ir[Wz]|Yvwu 


By repeatedly applying the lemma, we can generate the block representa- 
tion of Q in (5.1.3) from the factored form representation as follows: 


Algorithm 5.1.2 Suppose Q = Q,---@, is a product of n-by-n House- 
holder matrices as described in (5.1.3). This algorithm computes matrices 
W,Y c R**" such that Q = I + WYT. 


Y = vl) 

W = —f, v0) 

for j = 2:r 
z = f; WY Tuo? 
W z[W z] 
Y «[Y v0)] 

end 


This algorithm involves about 2r?n — 2r7/3 flops if the zeros in the vU) are 
exploited. Note that Y is merely the matrix of Householder vectors and is 
therefore unit lower triangular. Clearly, the central task in the generation 
of the WY representation (5.1.6) is the computation of the W matrix. 

The block representation for products of Householder matrices is attrac- 
tive in situations where Q must be applied to a matrix. Suppose C e IR^*?. 
It follows that the operation 


C —QTCs(I-WYT) C «C Y(WTO) 


is rich in level-3 operations. On the other hand, i£ Q is in factored form, 
QTC is just rich in the level-2 operations of matrix-vector multiplication 
and outer product updates. Of course, in this context the distinction be- 
tween level-2 and level-3 diminishes as C gets narrower. 

We mention that the “WY” representation is not a generalized House- 
holder transformation from the geometric point of view. True block reflec- 
tors have the form Q = I — 2VVT where V c R°™* satisfies VTV = I. 
See Schreiber and Parlett (1987) and also Schreiber and Van Loan (1989). 


Example 5.1.3 If n = 4, r = 2, and [1, .6, 0, .8]7T and [O, 1, .8, .6]|T ere the 
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Householder vectors associated with 4); and Q32 respectively, then 


aian newt ene | 71 xt. ne 


5.1.8 Givens Rotations 


Householder reflections are exceedingly useful for introducing zeros on a 
grand scale, e.g., the annihilation of all but the first component of a vec- 
tor. However, in calculations where it is necessary to zero elements more 
selectively, Givens rotations are the transformation of choice. These are 
rank-two corrections to the identity of the form 


1 0 0 . Q 
0 c 8 > 0 1 
G(i,k,0) = l A E- (5.1.7) 
Ü a" ici —8 v C oce 0 k 
QO e. Ü. xe posee. 7 
i k 


where c = cos(#?) and s = sin(é) for some 6. Givens rotations are clearly 
orthogonal. 
Premultiplication by G(£, k, 0)7 amounts to a counterclockwise rotation 
of @ radians in the (i,k) coordinate plane. Indeed, if r € R" and y = 
G(i, k, 8)7 z, then 


Sr, j=i 
yj = 8I; Cr, Jk 
Tj j£ük 
From these formulae it is clear that we can force yy to be zero by setting 


Ti —LIk 
c= = (5.1.8) 
V2; +r Vx} +24 
Thus, it is a simple matter to zero 4 specified entry in a vector by using a 
Givens rotation. In practice, there are better ways to compute c and s than 
(5.1.8). The following algorithm, for example, guards against overflow. 
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Algorithm 5.1.3 Given scalars a and b, this function computes c = cos(@) 


and s = sin(@) so - 
|- a] l-ie] 


function: [e, s] = givens(a, b) 


ifb=90 
e=1;s=0 
else 
if |b] > [al 
r= —a/b; s M VA TA c= sr 
else 
r= bfa; c= l1 T); s=er 
end 
end 


This algorithm requires 5 flops and a single square root. Note that it does 
not compute @ and so it does not involve inverse trigonometric functions. 


Example 5.1.4 If z = (1, 2, 3, 4|T, coe(8) = 1/5, and sin(8) = —2//5, then 
G(2,4,8)z = (1, 20, 3, olf. 


5.1.9 Applying Givens Rotations 


It is critical that the simple structure of a Givens rotation matrix be ex- 
ploited when it is involved in a matrix multiplication. Suppose A € IR™*”, 
c = cos(@), and s = sin(@). If G(i, k, 8) c R™*™, then the update A — 
Gli, k, 8)7 A effects just two rows of A, 


AQ =| 5 5 IEEE 


"s € 


and requires just Gn flops: 


for j = Ln 
n= A(t, j) 
T9 = Atk, j) 


A(1, 7) = CT, — 372 
A(2, 7) = 8T; + CT 
end 


Likewise, if G(i, k, 8) € IR "^, then the update A —— AG(i, k, @) effecta just 
two columns of À, 


Als iK) = AC fh) | i H 


-5 C 


and requires just 6m flops: 
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for j = lim 
T] = A(j, i) 
7, = AU, k) 
AG, i) = CT] — 375 
AQ, k) = STi + CH 
end 


5.1.10 Roundoff Properties 


The numerical properties of Givens rotations are as favorable as those for 
Householder reflections. In particular, it can be shown that the computed 
é and § in givens satisfy 
é 
8 


c(l + ec) €c O(u) 
s(1 + és) Ey O(u). 


If ĉ and § are subsequently used in a Givens update, then the computed 
update is the exact update of à nearby matrix: 


füiG(k,8)7A]J = G(k8)(A-E) — (Ell e ull A lla 


It il 
Holl 


fUAG(i, k, 8)] 


(A + E)G(i, k, 8) | E |a = ull A lle. 


A detailed error analysis of Givens rotations may be found in Wilkinson 
(1965, pp. 131-39). 


9.1.11 Representing Products of Givens Rotations 


Suppose Q = G; -:-G, is a product of Givens rotations. As we have seen in 
connection with Householder reflections, it is more economical to keep the 
orthogonal matrix Q in factored form than to compute explicitly the prod- 
uct of the rotations. Using a technique demonstrated by Stewart (1976), 
it is possible to do this in a very compact way. The idea is to associate a 
single floating point number p with each rotation. Specifically, if 


Z= | B E +s? = 1 
—8 € 
then we define the scalar p by 


if c=0 
p=1 
elseif |s| < |el 
p = sign(c)s/2 (5.1.9) 


p = 2sign(s)/c 
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Essentially, this amounts to storing s/2 if the sine is smaller and 2/c if the 
cosine is smaller. With this encoding, it is possible to reconstruct +Z as 
follows: 


if p= 1 
c=0;s=1 
elseif [p| < 1 
s=2p,c=vVJl-s (5.1.10) 
else 
c= 2/p; 85-2 y/1-c 
end 


That -Z may be generated is usually of no consequence for if Z zeros a 
particular matrix entry, so does —Z. The reason for essentially storing the 
smaller of c and s is that the formula V/1 — z^ renders poor results if x is 
near unity. More details may be found in Stewart (1976). Of course, to 
“reconstruct” G(i,k,0) we need i and k in addition to the associated p. 
This usually poses no difficulty as we discuss in 55.2.3. 


5.1.12 Error Propagation 


We offer some remarks about the propagation of roundoff error in algo- 
rithms that involve sequences of Householder/Givens updates. To be pre- 
cise, suppose A = A € IR™*” is given and that matrices 41,..., Ap = B 
are generated via the formula 


Ak = FUQeAn-1Ze) — k— Lp. 


Assume that the above Householder and Givens algorithms are used for 
both the generation and application of the Q, and Z, . Let Qg and Z, be 
the orthogonal matrices that would be produced in the absence of roundoff. 
It can be shown that 


B = (Qp Q(A* EZ Zp), (5.1.11) 


where || E ||; € culj A || and c is a constant that depends mildly on n, m, 
and p. In plain English, B is an exact orthogonal update of a matrix near 
to A. 


5.1.13 Fast Givens Transformations 


The ability to introduce zeros in a selective fashion makes Givens rotations 
an important zeroing tool in certain structured problems. This has led to 
the development of “fast Givens” procedures. The fast Givens idea amounts 
to a clever representation of Q when Q is the product of Givens rotations. 
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In particular, Q is represented by a matrix pair (M, D) where MTM = D = 
diag(d;) and each d; is positive. The matrices Q, M, and D are connected 
through the formula 


Q = MD"? = Mdiag(l/ ydi). 


Note that (MD-!/7)T(MD-71/72) = D-2 DD-Y3 = J and so the ma- 
tix MD-/? is orthogonal. Moreover, if F is an n-by-n matrix with 
FT DF = D4,, diagonal, then ML, Mnew = Dnew where M, = MF. 
Thus, it is possible to update the fast Givens representation (M, D) to ob- 
tain (Mnew, Drew). For this idea to be of practical interest, we must show 
how to give F zeroing capabilities subject to the constraint that it “keeps” 
D diagonal. 

The details are best explained at the 2-by-2 level. Let z = [zi 22]? and 
D = diag(d, dz) be given and assume that dj and d4 are positive. Define 


_ |r 1 
Mı = | Dx | (5.1.12) 
and observe that 
Mir = Byri + z2 
! B rj, + àlt? 
and i : 
T " at idi dii dom | _ 
MiMML S | didi +da di tafda | — x 
If zy Æ 0, ay = —21/%9, and mh = —ayjd>/d), then 
Zoll-F 
Mir = | m "| 
T _ | dal m) 0 
My een = l 0 dy(1 +1) 
where y, = -aĝ = (d2/¢1)(21/23)*. 
Analogously, if we assume x, Æ 0 and define M3 by 
l a 
Mı = | B, ia | (5.1.13) 


where a2 = —22/2) and f = —(di/d4)as, then 


Mrz = iz a | 


and 


T _ | dà - m) 0 s 
M, DM, = | 0 NA = Da, 
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where z= —agfhy = (dy /do)(z9/21)?. 

It is easy to show that for either i = 1 or 2, the matrix J = D'/2q,p7'? 
is orthogonal and that it is designed so that the second component of 
J?(D-'¥2z) is zero. (J may actually be a reflection and thus it is half- 
correct to use the popular term "fast Givens.") 

Notice that the y; satisfy +772. = 1. Thus, we can always select Mj in 
the above so that the "growth factor" (1-- y,) is bounded by 2. Matrices 


of the form 
B 1 | | 1 œ 
oe | 1 a Mati g Y 


that satisfy —1 < a;ĝ; < 0 are 2-by-2 fast Givens transformations. Notice 
that prernultiplication by a fast Givens transformation involves half the 
number of multiplies as premultiplication by an “ordinary” Givens trans- 
formation. Also, the zeroing is carried out without an explicit square root. 

In the n-by-n case, everything "scales up" as with ordinary Givens ro- 
tations. The "type 1” transformations have the form 


1o xm udEuae m ua d 
QO ... B. 1. Q0 1 
F(i,k,a,B) = : 2j o7 3 : (5.1.14) 
Q0 Lee a- O |k 
QO «+ 0 ees Q e 1 


i k 
while the "type 2" transformations are structured as follows: 


Y vee. Ob con dE scum. d 
0 1 a 0 t 
F(i,k,a, 3) = a E (5.1.15) 
Ü £e B ose J «ee 0 k 
0 0 -> 0 1 
k 
Encapsulating ali this we obtain 


Algorithm 5.1.4 Given z € IR? and positive d € R?, the following al- 
gorithm computes a 2-by-2 fast Givens transformation M such that the 
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second component of Mz is zero and MT DM = D; is diagonal where D 
= diag(d;, dz). If type = 1 then M has the form (5.1.12) while if type = 2 
then M has the form (5.1.13). The diagonal elements of D, overwrite d. 


function: [a, 8, type] = fast.givens(z, d) 


if z(2) £0 
a = —z(1)/z(2); 8 = —ad(2)/d(1); y = -aß 
if y¥<1 
type = 1 
T = d(1); d(1) = (1 + y)d(2); d(2) = (1 + y)r 
fe =2 
= l/a; 8 = 1/8; y¥ =1/7 
: 41) = = (1-F y)d(1); d) = (1 + y)d(2) 
en 
dad" 
= d 
end 


The application of fast Givens transformations is analogous to that for 
ordinary Givens transformations. Even with the appropriate type of trans- 
formation used, the growth factor 1 +y may still be as large as two. Thus, 
2* growth can occur in the entries of D and M after s updates. This means 
that the diagonal D must be monitored during a fast Givens procedure to 
avoid overflow. See Ánda and Park (1994) for how to do this efficiently. 

Nevertheless, element growth in M and D is controlled because at all 
times we have M D-1/2 orthogonal. The roundoff properties of a fast givens 
procedure are what we would expect of a Givens matrix technique. For ex- 
ample, if we computed Q = fl(M D^ V?) where M and D are the computed 
M and D, then Q is orthogonal to working precision: || Q7 Q — I ||; = u. 


Problems 


P5.1.1 Execute house with x = (1, 7, 2, 3, -1]7. 

P5.1.2 Let r and y be nonzero vectors in R". Give an algorithm for determining a 
Househoider matrix P such that Pr is a multiple of y. 

P5.1.3 Suppose z € C" and that z, = |z,Je* with 9 € R. Assume x X 0 and 
define u = r--e"|rl|aei Show that P = J — 2uu?"/uPwu is unitary and that 
Pr = -e| z |lze1. 

P5.1.4 Use Householder matrices to show that det(/ + ryT) = 1+27y where z and y 
are given n-vectors. 

P5.1.5 Suppose r € (7. Give an algorithm for determining a unitary matrix of the 
form 


Q= E A cER, +j? =l 


such that the second component of QF r is zero. 
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P5.1.0 Suppose r and y are unit vectors in R”. Give an algorithm umng Givens 
transformations which computes an orthogonal Q such that QT z = y. 


PB5.1.7 Determine c = cos(8) and s = sin(8) such that 
[i] Es] ES: 


P5.1.8 Suppose that Q = I + Y TYT is orthogonal where Y € R"*? and T € H/*J is 
upper triangular. Show that if Q} = QP where P = I — 2uvT /vT v is a Householder 
matrix, then Q4 can be expremsed in the form Q4 = I -Y,T,Y7 where Y} e Eo XG*U 
and T, € RY+1)*G+?) is upper triangular. 

P5.1.9 Give & detailed implementation of Algorithm 5.1.2 with the assumption that 
vU) (j+ Ln), the essential part of the the jth Householder vector, is stored in A(j - 1:n, j). 
Since Y is effectively represented in A, your procedure need only set up the W matrix. 
P§.1.10 Show that if S is skew-symmetric (57 = —5), then Q = (1+ S)(I - S)^! is 
orthogonal {Q is called the Cayley transform of 5.) Construct a rank-2 S so that if z 
is a vector then Qz is zero except in the first component. 


P5.1.11 Suppose P c EC*" satisfies || PT P — In ||, = € « 1. Show that all the singular 
values of P are in the interval (1 —- e, 1 + e] and that | P — UVT fl, < «where P = UXVT 
is the SVD of P. 


P5.1.12 Suppose A € R2*?, Under what conditions is the closest rotation to A closer 
than the closest reflection to A? 
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88.4. The Givens rotation storage scheme discussed in the text is detailed in 


G.W. Stewart (1976). “The Economical Storage of Plane Rotations,” Numer. Math. 
25, 137-38. 


Fast Givens transformations are also referred to as "square-root-free" Givens tranafor- 
mations. (Recall that & square root must ordinarily be computed during the formation 
of Givens transformation.) There are severa] ways fast Givens calculations can be ar- 
ranged. See 


M. Gentieman (1973). “Least Squares Computations by Givens Transformations without 
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J.H. Wilkinson (1977). “Some Recent Advances in Numerical Linear Algebra,” in The 
State of the Art in Numerical Analyns, ed. D.A.H. Jacobe, Academic Press, New 
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5.2 The OR Factorization 


We now show how Householder and Givens transformations can be used to 
compute various factorizations, beginning with the QR factorization. The 
QR factorization of an m-by-n matrix A is given by 


A=QR 


where Q c R'™”™ is orthogonal and R € R™”™ is upper triangular. In this 
section we assume m > n. We will see that if A has full column rank, 
then the first n columns of Q form an orthonormal basis for ran( A). Thus, 
calculation of the QR factorization is one way to compute an orthonormal 
basis for a set of vectors. This computation can be arranged in several ways. 
We give methods based on Householder, block Householder, Givens, and 
fast Givens transformations. The Gram-Schmidt orthogonalizetion process 
and a numerically more stable variant called modified Gram-Schmidt are 
also discussed. 
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5.2.1 Householder QR 


We begin with a QR factorization method that utilizes Householder trans- 
formations. The essence of the algorithm can be conveyed by a small ex- 
ample. Suppose m = 6, n = 5, and assume that Householder matrices H, 
and H4 have been computed so that 


H4H4A = 


OOcoOoOoX 
ooco x x 
eS & x x 
X X X XXX 
X X X X xXx 


Concentrating on the highlighted entries, we determine a Householder ma- 
trix H4 € Rf*^ such that 


& x 

= a3 0 

Aaja | = | g 

a 0 

If H; = diag(I;, H3), then 

X x X X x 
0 x x x x 
0 0 x x x 
HsH;ihA = | 9 8 9 x x 
0 0 0 x x 
0 0 0 x x 


After n such steps we obtain an upper triangular 4,,H,_,---H,A = R and 
so by setting Q = H4... Hn we obtain A = QR. 


Algorithm 5.2.1 (Householder QR) Given A € R™*" with m > n, 
the following algorithm finds Householder matrices H,,...,H,, such that if 
Q = Hi... Ha, then QT A = R is upper triangular. The upper triangular 
part of A is overwritten by the upper triangular part of # and components 
j + 1: m of the jth Householder vector are stored in A(j + 1:m, j), j < m. 


for j = 1:n 
[v, 7] = house(A(j:m, j)) 
A(j:m, jin) = Um-j41 — BvvT ) A(j:m, j:n) 
ifj«m 
A(j + im, j) = v(2:mn — j +1) 
end 
end 
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This algorithm requires 2n?(m — n/3) flops. 
To clarify how À is overwritten, if 


D (4) PT 
v? = [0...01 uR] 
j-1 
is the jth Householder vector, then upon completion 
Ta 73 7?"u3 fua Tis 


Ua Ty T23 T24 Tas 


A= 
JD VO VO ru re 
s) a uo uf rng 
yh 9) VO uo vp 


If the matrix Q = H,---H, is required, then it can be accumulated using 
(5.1.5). This accumulation requires 4(m?n — mn? + n?/3) flops. 

The computed upper triangular matrix R is the exact R for a nearby A 
in the sense that ZT(A-- E) = Ft where Z is some exact orthogonal matrix 
and j| E ||; = ull A ilz. 


5.2.2 Block Householder QR Factorization 


Algorithm 5.2.1 is rich in the level-2 operations of matrix-vector multi- 
plication and outer product updates. By reorganizing the computation 
and using the block Householder representation discussed in §5.1.7 we can 
obtain a level-3 procedure. The idea is to apply clusters of Householder 
transformations that are represented in the WY form of §5.1.7. 

A small example illustrates the main idea. Suppose n - 12 and that 
the “blocking parameter? r has the value r = 3. The first step is to gener- 
ate Householders Hı, Ha, and H3 as in Algorithm 5.2.1. However, unlike 
Algorithm 5.2.1 where the H, are applied to ali of A, we only apply Hi, 
H3, and H3 to A(:, 1:3). After this is accomplished we generate the block 
representation Hı H2H, = I + WY and then perform the level-3 update 


A(5,412) = (I4-WY7)A(, 4:12). 
Next, we generate H4, Hs, and He as in Algorithm 5.2.1. However, these 


transformations are not applied to A(:, 7:12) until their block representation 
Hs He = I + W2YZ is found. This illustrates the general pattern. 
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A=1k=0 
while A<n 
r=min(A+r—I,n); k=k+1 
Using Algorithm 5.2.1, upper triangularize A{A:m, A:n) 
generating Householder matrices Hy,..., Hr- (5.2.1) 
Use Algorithm 5.1.2 to ae the block representation 


i+ WyY, = oe Hy.. 
A(A:;m, T + lin) = (I+ WINTA: m,T + ln) 
A=T+1 


end 


The zero-nonzero structure of the Householder vectors that define the ma- 
trices H),..., Hy implies that the first à — 1 rows of W, and Y, are zero. 
This fact would be exploited in & practical implementation. 

The proper way to regard (5.2.1) is through the partitioning 


= [A,..., Aw] N = ceil(n/r) 


where block column A, is processed during the kth step. In the kth step of 
(5.2.1), a block Householder is formed that zeros the subdiagonal portion 
of A,. The remaining block columns are then updated. 

The roundoff properties of (5.2.1) are essentially the same as those for 
Algorithm 5.2.1. There is a slight increase in the number of flops required 
because of the W-matrix computations. However, as a result of the block- 
ing, all but a small fraction of the flops occur in the context of matrix mul- 
tiplication. In particular, the level-3 fraction of (5.2.1) is approximately 
1 — 2/N. See Bischof and Van Loan (1987) for further details. 


5.2.3 Givens QR. Methods 


Givens rotations can also be used to compute the QR factorization. The 
4-by-3 case illustrates the general idea: 


X X X x x X x x x 

x x X L[(340|x x X 1 23) x x x i| (12 
— mdi — 

X x X x x X D x x 

x x X 0 x x D x x 

x x x x x x X X X 

0 x x (344) G x x (2,3) 0 x x (3.4) R 

0 x x Ü x x 0 O x 

Ü x x 0 O x 0 O0 x 


Here we have highlighted the 2-vectors that define the underlying Givens 
rotations. Clearly, if G; denotes the jth Givens rotation in the reduction, 
then QTA = R is upper triangular where Q = G,-+-G, and t is the total 
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number of rotations. For general m and n we have: 


Algorithm 5.2.2 (Givens QR) Given A c R"*" with m > n, the fol- 
lowing algorithm overwrites A with QTA — R, where R is upper triangular 
and Q is orthogonal. 


for j = l:n 
for i=m:-i:j+1 
[c s] = givens(A(é — 1, j), A(t, 3)) 
A(i — l:i, jin) = l 4 j | A(i — 1:4, j:n) 
end 
end 


This algorithm requires 3n?(m — n/3) flops. Note that we could use (5.1.9) 
to encode (c, s) in a single number p which could then be stored in the zeroed 
entry Á(i,j). An operation such as z —— QT r could then be implemented 
by using (5.1.10), taking care to reconstruct the rotations in the proper 
order. 

Other sequences of rotations can be used to upper triangularize A. For 
example, if we replace the for statements in Álgorithm 5.2.2 with 


fori—-m:-1:2 
for j = l:min(i — i, n} 


then the zeros in A are introduced row-by-row. 

Another parameter in à Givens QR procedure concerns the planes of 
rotation that are involved in the zeroing of each a;;. For example, instead 
of rotating rows į — 1 and t to zero a,; as in Algorithm 5.2.2, we could use 
rows j and i: 


for } = lm 
fori-m:-1:j-c1 
[e, s] = givens(A(j, j), A(i, J) 
Aiii -| $2] Ads shim) 
end 
end 


9.4.4 Hessenberg QR via Givens 


As an example of how Givens rotations can be used in structured problema, 
we show how they can be employed to compute the QR factorization of an 
upper Hessenberg matrix. A small example illustrates the general idea. 
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Suppose n = 6 and that after two steps we have computed 


M 


X X X X Xx X 
0 x x x Xx X 
0 0 x x x Xx 

T Ta — 
G(2, 3, 82) G(1, 2, #1) 5:520: 20: 3€ 3€ x X 
0 0 O x x x 
0 0 0 O x x 


We then compute G(3, 4, 83) to zero the current (4,3) entry thereby obtain- 
ing 


G(3, 4, 64) G(2,3,85) G(1,2,0,)7 A = 


ooo a qa x 
coo oo xk XK 
ooo XK K Xx 
ox * K xX XK 
x x X X XOX 
X XX XXX 


Overall we have 


Algorithm 5.2.3 (Hessenberg QR) If A € IR""*" is upper Hessenberg, 
then the following algorithm overwrites A with QTA = R where Q is or- 
thogonal and R is upper triangular. Q = G,--:G,_ isa product of Givens 
rotations where G; has the form G; = G(j, j + 1, 8j). 
for 7 —L:n-1 
[cs] = givens(A(j, 7), AU + 1,3) 


T 
A(j:j + 1,j:n)} = | E: : | A(:j + 1, j:n) 
end 


This algorithm requires about 3n? flops. 


5.2.5 Fast Givens QR. 


We can use the fast Givens transformations described in 85.1.13 to compute 
an (M, D) representation of Q. In particular, if M is nonsingular and D 
is diagonal such that MTA = T is upper triangular and MT M = D is 
diagonal, then Q = M D-!/? is orthogonal and QTA = D-V?T Ris 
upper triangular. Analogous to the Givens QR procedure we have: 


n ow 


Algorithm 5.2.4 (Fast Givens QR) Given A € R”™” with m > n, the 
following algorithm computes nonsingular M € R™*™ and positive d(1:m) 
such that MT A — T is upper triangular, and MT M = diag(d;,...,dm). A 
is overwritten by T. Note: A = (MD-'/?)( D7?T! is a QR factorization 
of A. 
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for i = l:m 
d(i) —1 
end 
for 7 = l:n 
fori=m:-1:3+4+1 
[a, 8, type] = fast.givens( A(1 — 1:2, 7), d(i — 1:1)) 
if type = 1 


A(t — 1:1, jin) = | oo 


T 
iom | Ati — 1:3, jin) 


else 
l a = 
A(i — li, jin) = | 81 | Afi — 1:, jin) 
end 
end 


This algorithm requires 2n?(m — n/3) flops. As we mentioned in the pre- 
vious section, it is necessary to guard against overflow in fast Givens algo- 
rithms such as the above. This means that M, D, and A must be periodi- 
cally scaled if their entries become large. 

If the QR factorization of a narrow band matrix is required, then the 
fast Givens approach is attractive because it involves no square roots. (We 
found LDL? preferable to Cholesky in the narrow band case for the same 
reason; see §4.3.6.) In particular, if A € K"*" has upper bandwidth q and 
lower bandwidth p, then QTA = R has upper bandwidth p +g. In this 
case Givens QR requires about O(np(» + q)) flops and O(np) square roots. 
Thus, the square roots are a significant portion of the overall computation 
if Dp,q «&n. 


5.2.6 Properties of the QR Factorization 


The above algorithms “prove” that the QR factorization exists. Now we 
relate the columns of Q to ran(A) and ran(A)~ and examine the uniqueness 
question. 


Theorem 5.2.1 If A = QR is a QR factorization of a full column mnk 
A € R"*" and A= f[a,,...,@n] and Q = | qi,..., Qm | are column parti- 
tionings, then 


span{a,,...,4,} = speníg,....qx) k= Ln. 
In particular, if Q1 = Q(1:m, Lin) and Qs = Q(1:m, n + Lim) then 


ran(A}) = ran(Qi) 
ran( A)* = ran(Q») 


and A= Q Bı with Ri = R(1:mn, lin). 
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Proof. Comparing kth columns in A = QR we conclude that 


k 
âk = Y raa € span{q),....dx} . (5.2.2) 
imi 
Thus, span{ai,.-.,¢.} G spen{gi,..., qk}. However, since rank( A) = 
n it follows that span{a1,..., ax} has dimension k and so must equal 
span(gi,...,qx) The rest of the theorem follows trivially. 0 


The matrices Q1 = Q(1:m, L:n) and Q4 = Q(l:m, n + 1:m) can be easily 
computed from a factored form representation of Q. 

If A = QR is a QR factorization of A € IR" *" and m > n, then we refer 
to A = Q(:, 1:n) R(1:n, l:n) as the thin QR factorization. The next result 
addresses the uniqueness issue for the thin QR factorization 


Theorem 5.2.2 Suppose A € IR"*" has full column rank. The thin QR 
factorization 

A-QiR 
is unique where Q, € IR" "" has orthonormal columns and Ry, is upper tri- 
angular with positive diagonal entries. Moreover, R, = GT where G is the 
lower triangular Cholesky factor of AT A. 


Proof. Since AT A = (Qi R1)  (QiR,) = RT R, we see that G = RT is the 
Cholesky factor of AT A. This factor is unique by Theorem 4.2.5. Since 
Qı = AR; ! it follows that Q; is also unique. O 


How are Q; and R affected by perturbations in A? To answer this 
question we need to extend the notion of condition to rectangular matrices. 
Recall from 82.7.3 that the 2-norm condition of a square nonsingular matrix 
is the ratio of the largest and smallest singular values. For rectangular 
matrices with full column rank we continue with this definition: 


Cmaz( A) 

Cwein(Á) ` 

If the columns of A are nearly dependent, then «2(A) is large. Stewart 
(1993) has shown that O(c) relative error in A induces O(ex2( A)) relative 
error in R and Q;. 


A € IR"*",rank(A) =n = x4(A) = 


5.2.7 Classical Gram-Schmidt 


We now discuss two alternative methods that can be used to compute the 
thin QR factorization A = QF directly. If rank(A) = n, then equation 
(5.2.2) can be solved for qg: 


k-1 
dk = (s. = m) / Tkk - 
imi 
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Thus, we can think of qg as a unit 2-norm vector in the direction of 


k—1 
te = Gy — > ried 


im] 
where to ensure z, € span(gi,...,qk-i1] ^ we choose 
ik = Gf Oy i-Lk-1. 


This leads to the classical Gram-Schmidt (CGS) algorithm for computing 
A = Qi R. 

R(1,1) =] AC, 1) f]; 

Q(:, 1) = AC:, 1)/ R(1, 1) 


for k = 2:n 
R(l:k — i,k) = Q(L:m, :k — 1)7 A{ lim, k) 
z = A(l:m,k) - Q(1:m, 1:4 — 1) R(Ek — 1,4) (5.2.3) 
R(k,k) = liz lla 
Q(1:m,k) = z/R(k, k) 
end 


In the kth step of CGS, the kth columns of both Q and R are generated. 


5.2.8 | Modified Gram-Schmidt 


Unfortunately, the CGS method has very poor numerical properties in that 
there is typically a severe loss of orthogonality among the computed qi. 
Interestingly, a rearrangement of the calculation, known as modified Gram- 
Schmidt (MGS), yields a much sounder computational procedure. In the 
kth step of MGS, the kth column of Q (denoted by gg) and the kth row of 
R (denoted by rT) are determined. To derive the MGS method, define the 
matrix A‘* c pmxín-kr1) by ' 


k-i nm 
A-Y ar? = Soar? = [040]. (5.2.4) 
iml imk 
It follows that if 
A =[{z B] 
1 n—k 
then rk = [12 lo. gu = z/rkk and (rk k+i'* Tkn) = gf B. We then 


compute the outer product AlM*+)) = B — gp (Tk k41" Tin} and proceed 
to the next step. This completely describes the kth step of MGS. 


Algorithm 5.2.5 (Modified Gram-Schmidt) Given A € IRP*? with 
rank( A) = n, the following algorithm computes the factorization A = Q; Ri 
where Qi € R™*" has orthonormal columns and R; € IR°™*" is upper tri- 
angular. 
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for k = l:n 
Rik, k) = || A(1:m, k) [|a 
Q(1:m, k) = A(1:m, k)/ R(k, k) 
for j=k+1n 
R(k, j) = Q(Umn, k)7 A(1:m, 7) 
: A(1:m, j) = A(1:mn, 7) — Q(1:m, k) R(k, j) 
end 


This algorithm requires 2mn? flops. It is not possible to overwrite A with 
both Q, and £t. Typically, the MGS computation is arranged so that A is 
overwritten by Q4 and the matrix A, is stored in a separate array. 


5.2.9 Work and Accuracy 


If one is interested in computing an orthonormal basis for ran(A), then 
the Householder approach requires 2mn* — 2n7/3 flops to get Q in fac- 
tored form and another 2mn? — 2n?/3 flops to get the first n columns of 
Q. (This requires ^paying attention" to just the first columns of Q in 
(5.1.5).) Therefore, for the problem of finding an orthonormal basis for 
ran(À), MGS is about twice as efficient as Householder orthogonalization. 
However, Björck (1967) has shown that MGS produces a computed Qi = 
[ĝi -Ên | that satisfies 


QTQ| = I + Emes — | Enos lla = urz{A) 


whereas the corresponding result for the Householder approach is of the 
form 

QiQi = I + Ey ||Enh=u. 
Thus, if orthonormality is critical, then MGS should be used to compute 
orthonormal bases only when the vectors to be orthogonalized are fairly 
independent. 

We also mention that the computed triangular factor R produced by 
MGS satisfies || A - QR ]| ~ ull A || and that there exists a Q with perfectly 
orthonormal columns such that || 4 — QR || = ull Af]. See Higham (1996, 
p.379). 


Example 5.2.1 If modified Gram-Schmidt is applied to 


l l 
A x= | to? 0 Kal A) = L.4- 107 
0 1073 
with 6-digit decimal arithmetic, then 
1.00000 0 | 


[h Ge] = | -0071 —.TOT107 
0 .707 100 
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5.2.10 A Note on Complex QR 


Most of the algorithms that we present in this book have complex ver- 
sions that are fairly straight forward to derive from their real counterparts. 
(This is not to say that everything is easy and obvious at the implementa- 
tion level.) As an illustration we outline what a complex Householder QR 
factorization algorithm looks like. 

Starting at the level of an individual Householder transformation, sup- 
pose 0 X z € C” and that T; = re? where r, c R. Ifv = z E ef || x |;ei 
and P = J, — Buv”, B = 2/vH v, then Pr = pe} z ||ge;. (See P5.1.3.) 
The sign can be determined to maximize || v ||; for the sake of stability. 

The upper triangularization of A € R™*", m > n, proceeds as in Algo- 
rithm 5.2.1. In step j we zero the subdiagonal portion of A(j:m, 7): 


for j 2 lm 

z= A(j:m,j) 

vort e| z |lze, where z; = re”. 

B =2/v” fu 

Am, jin) = (Tagen — Bev AGim, jin) 
end 


The reduction involves 8n?(m — n/3) real flops, four times the number 
required to execute Algorithm 5.2.1. If Q = P,---F, is the product of the 
Householder transformations, then Q is unitary and QT A = R € R™*" is 
complex and upper triangular. 


Problema 


P5.2.1 Adapt the Householder QR algorithm so that it can efficiently handie the case 
when A € R®** has lower bandwidth p and upper bandwidth g. 


P5.2.2 Adapt the Housebholder QR algorithm so that it computes the factorization 
A = QL where L is lower triangular and Q is orthogonal. Assume that A is square. This 
involves rewriting the Householder vector function v = house(z) so that (1—2ev7 /vT v)z 
im zero everywhere but its bottom component. 


P5.2.3 Adapt the Givens QR factorization algorithm 90 that the zeros are introduced by 
nal. That is, the entries are zeroed in the order (m, 1), (m — 1, 1), (m, 2), (m— 2, 1}, 
(m T 1,2), (m, 3) , etc. 


P5.2.4 Adapt the fast Givens QR factorization algorithm so that it efficiently handles 
the case when A is n-by-n and tridiagonal. Asmume that the subdiagonal, diagonal, and 
superdiagonal of A are stored in e(1:n — 1), a(I:n), f(1:n — 1) respectively. Design your 
algorithm so that these vectors are overwritten by the nonzero portion of T. 

P5.2.5 Suppose L c R™*" with m > n is keer triangular. Show how Houseboider 
matrices 1... Hn can be used to determine a lower triangular L1 € R"*” so that 


Ha HiL = | s | 
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Hint: The second step in the 6-by-3 case involves finding H4 so that 
0 


Ha 


X XX XXX 
X X X X Xx 

cOocooxooc 
X X X X X Xx 
OoOoxxo 
ao ooxoc 


with the property that rows 1 and 3 are left alone. 
P5.2.6 Show that if 


R w k € k 
ae | o EI un diia PNE 
k 


and A has full coluran rank, then min ll Ar -= b ||? = un ay - (vTd/l va)”. 


P5.2.T Suppose A € A™** and D = diag(di,...,d4) € EC *". Show how to construct 
an orthogonal Q such that Q7 A — DQ? = R is upper triangular. Do not worry about 
efficiency —this is just an exercise in QR manipulation. 
P5.2.8 Show how to compute the QR factorization of the product A = Ap- A241 
without explicitly multiplymg the matrices A,,...,A, together. Hint: In the p = 
3 case, write QZ A = QT AsQaQT AsQ1QT A1 and determine orthogonal Q; so that 
QT (AQ;—1) ia upper triangular. (Qo = J). 
P5.2.9 Suppose 4 c R^*"^ and let E be the permutation obtained by reversing the 
order of the rows in In. (This is just the exchange matrix of 54.7.) (a) Show that if 
Re R"*? is upper triangular, then L = ERE in lower triangular. (b) Show how to 
compute an orthogonal Q € E**'* and a lower triangular L € R**" so that A = QL 
assuming the availability of a procedure for computing the QR factorization. 
P5.2.10 MGS applied to A € R™*" is numerically equivalent to the first step in House- 
holder QR applied to 

z On 

a= 5] 


where O, is the n-by-n zero matrix. Verify that this statement is true after the first 
step of each method is completed. 

P5.2.11 Reverse the loop orders in Algorithm 5.2.5 (MGS QR) so that R is computed 
column-by-column. 

P5.2.12 Develop a complex version of the Givens QH factorization. Refer to P5.1.5. 
where complex Givens rotations are the thema. Is it possible to organize the calculations 
eo that the diagonal elements of H are nonnegative? 


Notes and References for Sec. 5.2 
The idea of using Householder transformations to solve the LS problem wes proposed in 


A.S. Householder (1958). “Unitary Triangularization of a Nonsymmetric Matrix,” J. 
ACM. 5, 339-42. 


The practica] detaila were worked out in 


P. Businger and G.H, Golub (1965). “Linear Least Squares Solutions by Householder 
Transformations,” Numer. Math 7, 269-76. See also Wilkinson and Reinsch 
(1971,111-18). 

G.H. Golub (1965). “Numerical Methods for Solving Linear Least Squares Problems,” 
Numer. Math, 7, 206-186. 
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W. Givens (1958). “Computation of Plane Unitary Rotations Tranaforming a General 
Matrix to Triangular Form,” SIAM J. App. Math. 6, 26-50. 

M. Gentleman (1973). "Error Analysis of QR Decompositions by Givens Transforma- 
tions,” Lin. Alg. and lis Appl. 10, 189-97. 


For a discussion of how the QR factorization can be used to solve numerous problems in 
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G.H. Golub (1969). “Matrix Decompositions and Statistical Computation," in Statistical 
Computation , ed. FLC. Milton and J.A. Neider, Academic Press, New York, pp. 
365-97. 


The behavior of the Q and R factors when A is perturbed is discussed in 


G.W. Stewart (1977}. “Perturbation Bounds for the QR Factorization of a Matrix,” 
SIAM J. Num. Anal 14, 509-18. 

H. Zha (1993). “A Componentwise Perturbation Analysis of the QR Decomposition,” 
SIAM J. Matriz Anal Appi. 4, 1124—1131. 

G.W. Stewart (1993). "On the Perturbation of LU Cholesky, and QR. Factorizations," 
SIAM J. Matriz Anal Appl 14, 1141-1145. 

A, Barrlund (1994). “Perturbation Bounds for the Generalized QR Factorization,” Lin. 
Alg. and Its Applic. 907, 251-271. 

J.-G. Sun (1995). “On Perturbation Bounds for the QR Factorization,” Lin. Alg. and 
its Applic. £15, 95-112. 


The main result is that the changes in Q and R are bounded by the condition of A times 
the relative change in A. Organizing the computation 80 that the entries in Q depend 
continuously on the entries in A is discussed in 


T.F. Coleman and D.C. Sorensen (1984). “A Note on the Computation of an Orthonor- 
mal Basis for the Null Space of a Matrix,” Mathematical Programming £9, 234-242. 


References for the Gram-Schmidt process include include 


J.R. Rice (1966). “Experiments on Gram-Schmidt Orthogonslization,” Math Comp. 
20, 325-728. 

A. Björk (1967). “Solving Linear Least Squares Problems by Gram-Schmidt Orthogo- 
nalization,” BIT 7, 1-31. 

N.N. Abdelmalek (1971). “Roundoff Error Analysia for Gram-Schmidt Method and 
Solution of Linear Least Squares Problems," BIT 11, 345-68. 

J. Daniel, W.B. Gragg, L.Kaufman, and G.W. Stewart (1976). “Reorthogonalization 
and Stable Algorithms for Updating the Gram-Schmidt QR Factorization,” Math. 
Comp. 30, TT2- 795. 

A. Ruhe (1983). “Numerical Aspects of Gram-Schmidt Orthogonalization of Vectors,” 
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Gram-Schmidt Algorithm,” SIAM J. Sci. Stat. Comp. 12, 1058-1073. 

A. Bjorck and C.C. Paige (1992). “Loss and Recapture of Orthogonality in the Modified 
Gram-Schmidt Algorithm,” SIAM J. Mazriz Anal Appl. 13, 176-190. 

A. Björck (1994). “Numerics of Gram-Schmidt Orthogonalization, Lin. Alg. and its 
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The QR factorization of a structured matrix is wrually structured itself. See 


A.W. Bojanczyk, R.P. Brent, and F.R. de Hoog (1988). "QR Factorization of Toeplita 
Matrices,” Numer. Math. 49, 81-94. 
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S. Qiao(1986). "Hybrid Algorithm for Fast Toeplitz Orthogonelization," Numer. Math. 
53, 351-366. 

C.J. Demeure (1989). “Fast QR Factorization of Vandermonde Matrices,” Lin. Aig. 
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Various high-performance issues pertaining to the QR factorization are discussed in 


B. Mattingly, C. Meyer, and J. Ortega (1989). “Orthogonal Reduction on Vector Com- 
puters,” SIAM J. Sci. and Stat. Comp. 10, 372-381. 

P.A. Knight (1995). "Fast Rectangular Matrix Multiplication and the QR. Decomposi- 
tion," Lin. Aig. and ite Applic. 221, 69-81. 


5.3 The Full Rank LS Problem 


Consider the problem of finding a vector z c R” such that Ar = b where 
the data matriz A € IR"*" and the observation vector b € IR™ are given and 
m > n. When there are more equations than unknowns, we say that the 
system Ar = b is overdetermined. Usually an overdetermined system has 
no exact solution since 6 must be an element of ran( A), a proper subspace 
of R”. 

This suggests that we strive to minimize || Ax — b ||, for some suitable 
choice of p. Different norms render different optimum solutions. For exam- 
ple, if A = [1, 1, 1]T and b = (bi, bz, 63]? with by > b > b > O, then it 
can be verified that 


p= i — lop = b 
p = 2 > Toe = (bhtb;tb)/3 
p = o0 => Lop = (bh + 3)/2. 


Minimization in the i-norm and oo -norm is complicated by the fact that 
the function f(z) = || Ar.— b||, is not differentiable for these values of 
P. However, much progress has been made in this area, and there are 
several good techniques available for 1-norm and co-norm minimization. 
See Coleman and Li (1992), Li (1993), and Zhang (1993). 

In contrast to general p-norm minimization, the least squares (LS) prob- 
lem 


min ||A4z-—5]. (5.3.1) 
rcR" 


is more tractable for two reasons: 


+ g(x} = ill Ar - 6||2 is a differentiable function of z and so the min- 
imizers of ¢ satisfy the gradient equation V¢(x)= 0. This turns out 
to be an easily constructed symmetric linear system which is positive 
definite if A has full column rank. 
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e The 2-norm is preserved under orthogonal transformation. This means 
that we can seek an orthogonal Q such that the equivalent problem 
of minimizing || (Q7 A)z — (Q7 5b) ||, is “easy” to solve. 


In this section we pursue these two solution approaches for the case when 
A has full column rank. Methods based on normal equations and the QR 
factorization are detailed and compared. 


5.3.1 Implications of Full Rank 
Suppose r € IR", z € R” , and a € R and consider the equality 


| A(z +z) — bf} = | Ar- bl + 2a27A7 (Az — b) - a?l| Az [3 


where A € R™*" and be R”, If z solves the LS problem (5.3.1) then 
we must have A7(Ar — b) = 0. Otherwise, if z = —AT(Az — b) and 
we make a small enough, then we obtain the contradictory inequality 
| A(z 4 az) -b ls < [Az-—5]||,; We may also conclude that if z and 
zx +az are LS minimizers, then z € null( A). 

Thus, if A haa full column rank, then there is a unique LS solution z;5 
and it solves the symmetric positive definite linear system 


AT Arps = ATD. 


These are called the normal equations. Since V¢(z) = AT (Az — b) where 
é(z) = ll Ar — b || , we see that solving the normal equations is tanta- 
mount to solving the gradient equation Vé = 0. We call 


Trg = b- ÁTLS 
the minimum residual and we use the notation 
pus = || Azrs — b Ila 


to denote its size. Note that if prs is small, then we can “predict” 6 with 
the columns of A. 

So far we have been assuming that A c IR™*" has full column rank. 
This assumption is dropped in §5.5. However, even if rank(A) = n, then 
we can expect trouble in the above procedures if A is nearly rank deficient. 

When assessing the quality of a computed LS solution 7,5, there are 
two important issues to bear in mind: 


e How close is Z5 to ris? 


e How small is fgs = b — Á£rs compared to rgs = b — Ázrs? 
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The relative importance of these two criteria varies from application to 
application. In any case it is important to understand how Tzs and ris 
are affected by perturbations in A and 6. Our intuition tells us that if 
the columns of A are nearly dependent, then these quantities may be quite 
sensitive. 


Example 5.3.1 Suppose 


1 0 0 0 1 0 
A=| 0 1079 | , A=] 0 0 ,o=/ 0] , d=] Of, 
0 0 0 1075 1 0 


and that 2,5 and Z;.¢ minimize || Az — b f; and || (A + 6A)x — (b + 56) ||, respectively. 
Let r;s and ftg be the corresponding minimum residuals. Then 


1 1 0 0 
à . y 
ves | | uad | . | ites =] 0 |, trs = | —.9999- 10 
0 ™ | .9999. 104 1 "10° 


Since x2(AÀ) 10° we have 
I £ts-rLs lla c 9999. 104 < &2(AY ll 5A la _ 192 . 107? 
{tus fla A lla 


and 


E 6A 
Mus 7 rus lle 7970. 10-2 < Sa) eae = 108.1078. 
Hels i Alls 


The example suggests that the sensitivity of z;.s depends upon «2(A)*. At 
the end of this section we develop a perturbation theory for the LS problem 
and the «2(A)? factor will return. 


5.3.2 The Method of Normal Equations 


The most widely used method for solving the full rank LS problem is the 
method of normal equations. 


Algorithm 5.3.1 (Normal Equations) Given A € R™*" with the prop- 
erty that rank(A) = n and b € R”, this algorithm computes the solution 
zps to the LS problem min || Az — b ||, where b € R”. 


Compute the lower triangular portion of C = AT A. 
d= ATb 

Compute the Cholesky factorization C = GGT. 
Solve Gy = d and GT z; s = y. 


This algorithm requires (m + n/3)n? flops. The normal equation approach 
is convenient because it relies on standard algorithms: Cholesky factoriza- 
tion, matrix-matrix multiplication, and matrix-vector multiplication. The 
compression of the m-by-n data matrix A into the (typically) much smaller 
n-by-n cross-product matrix C is attractive. 
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Let us consider the accuracy of the computed normal equations solution 
£rs. For clarity, assume that no roundoff errors occur during the formation 
of C = AT A and d = ATb. (On many computers inner products are accu- 
mulated in double precision and so this is not a terribly unfair assumption.) 
It follows from what we know about the roundoff properties of the Cholesky 
factorization (cf. 54.2.7) that 


(ATA + Ers = ATb, 
where || E lla = uf A? ihalli All, = uif AT A ||, and thus we can expect 
lus -tisla a weo(ATA) = ue AY. (5.3.2) 
Ilzes liz 


In other words, the accuracy of the computed normal equations solution 
depends on the square of the condition. This seems to be consistent with 
Example 5.3.1 but more refined comments follow in §5.3.9. 


Example 5.3.2 [t should be noted that the formation of AT A can result in a severe 
logs of information. 


L 1 2 
A= 1073 a andb = 1073 
0 1073 1073 


then &3(A) = 1.4: 10°, zps = [1 1]7, and prs = 0. If the normal equations method is 
executed with base 10, t = 6 arithmetic, then a divide-by-zero occurs during the solution 
process, since 


Tac |1 1 
mATA- i] 
is exactly singular. On the other hand, if T-digit,arithmetic ia used, then 2r5 = 
[ 2.000001 , 0]7 and [[£rs — xus lall zs lla = uxz( A)". 


5.3.3 LS Solution Via QR Factorization 


Let A € R™™" with m > n and b € R™ be given and suppose that an 
orthogonal matrix Q € IR™*™ has been computed such that 


QTA=R= ki MuR (5.3.3) 
is upper triangular. Tf 

TL c n 

qi P m-n 


then 


| Az ~ 643 =I QTAzr- Qb} — Rz- celi + idii 
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for any x € IR". Clearly, if rank(A) = rank(R,) = n, then zgg is defined 
by the upper triangular system E,rps =c. Note that 


pus = | d lz- 


We conclude that the full rank LS problem can be readily solved once we 
have computed the QR factorization of A. Details depend on the exact QR 
procedure. If Householder matrices are used and Q7 is applied in factored 
form to 4, then we obtain 


Algorithm 5.3.2 (Householder LS Solution) If A € IR"*" has full 
column rank and b € IR™, then the following algorithm computes a vector 
zis € IR^ such that || Arps — b ||; is minimum. 


Use Algorithm 5.2.1 to overwrite A with its QR factorization. 
for j = lin 
v(j) = 1; v(j + lim) = AG + Lm, j) 
b(j:m) = (Im—s+1 — BjvvT jblj:m) 
end 
Solve A(1:n, 1:n)zz¢ = b(l:n) using back substitution. 


This method for solving the full rank LS problem requires 2n?(m — n/3) 
flops. The O(mn) flops associated with the updating of b and the O(n?) 
flops associated with the back substitution are not significant compared to 
the work required to factor A. 

It can be shown that the computed £r solves 


min|| (A + 6A)z — (b+ 6b) |l; (5.3.4) 
where 
{| &A || p < (6m — 3n + 41)nul| A || e. + O(u’) (5.3.5) 
and 
|| 8 I, < (6m — 3n + 40)nui| b ||; + O(u?). (5.3.6) 


These inequalities are established in Lawson and Hanson (1974, p.90ff) and 
show that frs satisfies a "nearby" LS problem. (We cannot address the 
relative error in £gs without an LS perturbation theory, to be discussed 
shortly.) We mention that similar resulta hold if Givens QR is used. 


5.3.4 Breakdown in Near-Rank Deficient Case 


Like the method of normal equations, the Householder method for solving 
the LS problem breaks down in the back substitution phase if rank(A) « n. 
Numerically, trouble can be expected whenever xq(A) = x4(R) = 1/u. 
This is in contrast to the normal equations approach, where completion 
of the Cholesky factorization becomes problematical once xa( A) is in the 
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neighborhood of 1/./u. (See Example 5.3.2.) Hence the claim in Lawson 
and Hanson (1974, 126-127) that for a fixed machine precision, a wider 
class of LS problems can be solved using Householder orthogonalization. 


5.3.9 A Note on the MGS Approach 


In principle, MGS computes the thin QR factorization A = QiR;. This is 
enough to solve the full rank LS problem because it transforms the normal 
equations (AT A)z = ATb to the upper triangular system Riz = QTD. 
But an analysis of this approach when Qf is explicitly formed intro- 
duces a x9(A)? term. This is because the computed factor Q; satisfies 
| QTQ, — Jn |; ^: ux;( A) as we mentioned in §5.2.9. 

However, if MGS is applied to the augmented matrix 


A. 2 [49] 2 I0 o] | j|. 


then z = QTb. Computing QTb in this fashion and solving Ritzs = z 
produces an LS solution £rs that is "just as good" as the Householder QR 
method. That is to say, a result of the form (5.3.4)-(5.3.6) applies. See 
Bjórck and Paige (1992). 

It should be noted that the MGS method is slightly more expensive 
than Househoider QR because it always manipulates m-vectors whereas 
the latter procedure deals with ever shorter vectors. 


5.3.6 Fast Givens LS Solver 


The LS problem can also be solved using fast Givens transformations. Sup- 
pose MT M = D is diagonal and 


Ta 5 l n 
is upper triangular. If 


MTb 


Il 
a 
& O 
LI 
3 

| B 
= 


then 


| Az 618 =| DMM Ar-H) HB e |o (| e-f 


for any r € R°. Clearly, zzs is obtained by solving the nonsingular upper 
triangular system Sir — c. 

The computed solution £rs obtained in this fashion can be shown to 
solve a nearby LS problem in the sense of (5.3.4)-(5.3.6). This may seem 
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surprising since large numbers can arise during the calculation. An entry 
in the scaling matrix D can double in magnitude after a single fast Givens 
update. However, largeness in D must be exactly compensated for by large- 
ness in M, since DOIM is orthogonal at all stages of the computation. 
It is this phenomenon that enables one to push through a favorable error 
analysis. 


5.3.7 The Sensitivity of the LS Problem 


We now develop a perturbation theory that assists in the comparison of 

the normal equations and QR approaches to the LS problem. The theorem 

below examines how the LS solution and its residual are affected by changes 

in A and 5. In so doing, the condition of the LS problem is identified. 
Two easily established facts are required in the analysis: 


iA liz (AT AY ! AT |; = x2(A) 
(5.3.7) 
LANE H(ATAy l = 2A)” 
These equations can be verified using the SVD. 
Theorem 5.3.1 Suppose zr, r, 2, and? satisfy 
| Az —-b|, = min r=b-Az 
| (A + 6A)? — (b + ôb) ||, = min F= (b+ ôb) ~ (A+ 6A) 


where A and 6A are in R™*" with m > n and 0 Æ b and 65 are in IR". If 


m" [s lz n4 on(A) 


| All, i lè Ha ei( A) 
E saw _ PLS 
where prs = || Axis — 4 lla, then 
|z-zl 2K9{ A) 2 
Facil LY bii i e? athe 
Iz li <é US + tan(0)&,( A) ! + O(c^) (5.3.8) 
fr ll, < €(14+ 2n2(A)) min(1,m — n) + O(e*). (5.3.9) 


| 5 lla 
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Proof. Let E and f be defined by E = 6A/e and f = ób/e. By hypothesis 
| A ll, < n(A) and so by Theorem 2.5.2 we have rank(A -- £E) = n for 
all ¢ € [0, «]. It follows that the solution z(t} to 


(A 4-£E)T (A - £E)z(t) = (A-tE)T(b-tf) (5.3.10) 


is continuously differentiable for all ? € [0, e]. Since z = x(0) and ĉ = r(e), 
we have 
i-i + eilt) + O(d) 


The assumptions b # 0 and sin(8) # 1 ensure that z is nonzero and so 


Neth | l0 
lz lt, isi. ee *)- (5.3.11) 


In order to bound || z(0) ||, we differentiate (5.3.10) and set £ = 0 in the 
result. This gives 


ET Az ATEr + AT Aż(0) = ATS + ET 


i.e., 
z(0) = (ATAY 'AT(f — Ez) + (AT A)-! ETT. (5.3.12) 
By substituting this result into (5.3.11), taking norms, and using the easily 
verified inequalities || f |, € {| blj and || E ll, < I A ll, we obtain 
| £ — x ls l —i4T I| 5 il. 
——T^ Sg] A llgil (ATA) AT | | —— 4+ 1 
Iz; HA Mall AVERETT 


PLS — f 
+a irgi 4 1747 a} + ot) 


Since AT (Az — b) = 0, Ar is orthogonal to Ax — b and so 
tò- Arl? +i Arii = 5H. 


Thus, 
| Al oF > Oo? -s 
and so by using (5.3.7) 


l£-zh . f. 2sin(9) à 
IA < ef mld) (args +1) + x,(A} a) + O(c) 


thereby establishing (5.3.8). 
To prove (5.3.9), we define the differentiable vector function r(t) by 


r(t) = (b + tf) - (A--tE)z(t) 
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and observe that r = r(0) and f = r(e). Using (5.3.12) it can be shown 
that 
(0) = (I~ A(ATA) 1 AT) (f — Ez) - A(AT A)! ETr. 


Since || ê — r |l; = ell (0) lla + O(c) we have 


lê-rla _ IOI, 
lel Th 56 


lA 


[uri aurata, (1+ 4 E) 


i Ila 


e AUT AY lall A bare} + O. 


Inequality (5.3.9) now follows because 
| A lalz lla = 8 A Hal A*b Ila S 52A] è lla, 


eus = LU - A(ATA) AT }b lla S I| 2 - ACATA)! AT llall è lla» 


and 
l| (Z — A(ATA) 1 A7 |], = min(m — n, 1). 0 


An interesting feature of the upper bound in (5.3.8) is the factor 


a. PLS .. 2 
tan(f)ka(À) = TI m Kal A)". 


Thus, in nonzero residual problems it is the square of the condition that 
measures the sensitivity of Tzs. In contrast, residual sensitivity depends 
just linearly on «2( A). These dependencies are confirmed by Example 5.3.1. 


5.3.8 Normal Equations Versus QR. 


It is instructive to compare the normal equation and QR approaches to the 
LS problem. Recall the following main points from our discussion: 


e The sensitivity of the LS solution is roughly proportional to the quan- 
tity x2(A)  pusko( A)". 


e The method of normal equations produces an Êr s whose relative error 
depends on the square of the condition. 


e The QR approach (Householder, Givens, careful MGS) solves a nearby 
LS problem and therefore produces a solution that has a relative error 
approximately given by u(&2(.À) + pLsml Ay. ). 
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Thus, we may conclude that if pps is small and «4(.4) is large, then the 
method of normal equations does not solve a nearby problem and will usu- 
ally render an LS solution that is less accurate than a stable QR approach. 
Conversely, the two methods produce comparably inaccurate results when 
applied to large residual, ill-conditioned problems. 

Finally, we mention two other factors that figure in the debate about 
QR versus normal equations: 


e The normal equations approach involves about half of the arithmetic 
when rn >> n and does not require as much storage. 


e QR approaches are applicable to a wider class of matrices because 
the Cholesky process applied to ATA breaks down “before” the back 
substitution process on QT A = R. 


At the very minimum, this discussion should convince you how difficult it 
can be to choose the “right” algorithm! 


Problems 


P5.3.1 Assume AT Az = A? b, (AT A + F)& = AT, and 24 F [la € on(A)?. Show that 
ifr = 5 — Az and F = b — Az, then ? — r = A(AT A + F)7! Fr and 


-rla < anal A) z la. 


P5.3.2 Assume that AT Az = ATb and that AT As = ATb + f where || f |l < 
cul] AT Ha} b ||; and A hes full column rank. Show that 


i22 < amataj EAT labels 
ell [A721 


P5.3.8 Let A € R™*" with m > n and y € R™ and define A = [Ay] c OXON +1), 
Show that &1(À) > 9)(A) and anyi (A) < en (A). Thus, the condition grows if a column 
is added to a matrix. 


P5.3.4 Let A € E'^*" (m > n), w € R^, and define 
A 
[4] 


Show that c4 (B) > on(A) and c1(B) < y IAI + |] w HZ . Thus, the condition of a 
matrix may increase or decrease if a row is added. 

P5.3.5 (Cline 1973) Suppose that A c R™™™ has rank n and that Gaussian elimination 
with partial pivoting is used to compute the factorization PA = LU, where L € R™*" is 
unit lower triangular, U € F'*" is upper triangular, and P ¢ R™*™ js a permutation. 
Explain how tha decomposition in P5.2.5 can be used to find a vector x € R" such that 
|| Lz — Pb |j, is minimized. Show that if Uz = z, then || Az — b |]; is minimum. Show 
that this method of solving the LS problem is more efficient than Householder QH from 
the flop point of view whenever m < 5n/3. 

P5.3.8 The matrix C = (47 A)7!, where rank(A) = n, arises in many statistical appli- 
cations and is known as the variance-covariance matriz. Assume that the factorization 
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A = QR is available. (a) Show C = (RT R)-!. (b) Give an algorithm for computing the 
diagonal of C that requires n?/3 flopa. (c) Show that 


a yt z -1 _ [ (+vTw —oTC,/a 
2d 2] > C = (RTR = | eae A | 


where C; = (ST S)-!. (d) Using (c), give an algorithm that overwrites the upper tri- 
angular portion of R with the upper triangular portion o£ C. Your algorithm should 
require 2n?/3 flops. 

P5.3.T Suppose A € IC *" is symmetric and that r = b — Ar where r, b, z € R” and 
z is nonzero. Show bow to compute a symmetric E € R^*^ with minimal Frobenius 
norm so that (A+ E)z =b. Hint. Use the QR factorization of (x, r] and note that 
Ez =r => (QTEQYQT2) = Qr. 

P5.3.8 Show how to compute the nearest circulant matrix to a given Toeplitz matrix. 
Measure distance with the Frobenius norm. 


Notes and References for Sec. 5.3 


Our restriction to least squares approximation is not a vote against minimization in other 
norms, There are occasions when it is advisable to minimize || Az — b ||, for p = 1 and 
oo. Some algorithms for doing this are described in 


A.K. Cline (19768). “A Descent Method for the Uniform Solution to Overdetermined 
Systems of Equations,” SIAM J. Num. Anal. 13, 293—309. 

R-H. Bartels, A.R- Conn, and C. Charalambous (1978). "On Cline's Direct Method for 
Solving Overdetermined Linear Systems in the L4, Sense," SIAM J. Num. Anal 15, 
255-70. 

T.F. Coleman and Y. Li (1992). “A Globally and Quadratically Convergent Affine 
Scaling Method for Linear Lj Problems," Mathematical Programming, 56, Series A, 
189-222. 

Y. Li (1993). “A Globally Convergent Method for L, Problems," SIAM J. Optimization 
3, 609-629. 

Y. Zhang (1993). "A Primal-Dual Interior Point Approach for Computing the Li and 
La, Solutions of Overdetermined Linear Systems," J. Optimization Theory and Ap- 
pheations 77, 323-341. 


The use of Gauss transformations to solve the LS problem has attracted some attention 
because they are cheaper to use than Householder or Givens matrices. See 


G. Peters and J.H. Wilkinson (1970). “The Least Squares Problem and Psendo-Inverses,” 
Comp. J. 13, 309-16. 

A.K. Cline (1973). “An Elimination Method for the Solution of Linear Least Squares 
Problema," SIAM J. Num. Anal. 10, 283-89. 

R.J. Plemmons (1974). "Linear Least Squares by Elimination and MGS,” J. Assoc. 
Comp. Mach. 21, 581-85. 


Important Analyses of the LS problem and various solution approaches include 


G.H. Golub and J.H. Wilkinson (1966). "Note on the Iterative Refinement of Least 
Squares Solution," Numer. Math. 9, 139—48. 

A. van der Sluis (1975). "Stability of the Solutions of Linear Least Squares Problem,” 
Numer. Math. 23, 241-4. 

Y. Saad (1986). “On the Condition Number of Some Gram Matrices Arising from Least 
Squares Approximation in the Complex Plans,” Numer. Math, 48, 337—348. 

A. Björck (1987). “Stability Analysis of the Method of Seminormal Equations,” Lin. 
Alg. and Its Applic. 88/89, 31-48. 


5.3. THE FULL RANK LS PROBLEM 247 


J. Gluchowska and A. Smoktunowiecz (1990). "Solving the Linear Least Squares Problem 
with Very High Relative Accuracy,” Computing 45, 345—354. 

A. Björck (1991). “Component-wise Perturbation Analysis and Error Bounds for Linear 
Least Squares Solutions," BIT 31, 238—244. 

A. Bjórck and C.C. Paige (1992). "Loss and Recapture of Orthogonality in the Modified 
Gram-Schmidt Algorithm,” SIAM J. Matriz Anal Appt. 13, 176-190. 

B. Waldén, R. Karlson, J. Sun (1995). “Optimal Backward Perturbation Bounds for the 
Linear Least Squares Problem,” Numerical Lin. Alg. with Applic. 2, 2771-286. 


The "seminormal" equations are given by AT Hz = ATb where A = QR. In the above 
paper it is shown that by solving the seminormal equations an acceptable LS solution is 
obtained if one step of fixed precision iterative improvement is performed. 

An Algot implementation of the MGS method for solving the LS problem appears in 


F.L. Bauer (1965). "Elimination with Weighted Row Combinations for Solving Lin- 
ear Equations and Least Squares Problems, Numer. Math. 7, 338-52. See also 
Wilkinson and Reinsch (1971, 119—33). 


Least squares problems often have special structure which, of course, should be exploited. 


M.G. Cox (1981). "The Least Squares Solution of Gverdetermined Linear Equations 
having Band or Augmented Band Structure," [MA J. Num. Anal. 1, 3-22. 

G. Cybenko (1984), "The Numerical Stability of the Lattice Algorithm for Least Squares 
Linear Prediction Problems,” BIT 24, 441-455. 

P.C. Hansen and H. Gesmar (1993). “Fast Orthogonal Decomposition of Rank- Deficient 
Toeplitz Matrices," Numerical Algorithms 4, 151-166. 


The use of Householder matrices to solve sparse LS problems requires careful attention 
to avoid excessive fili-in. 


J.K. Reid (1967). “A Note on the Least Squares Solution of a Band System of Linear 
Equations by Householder Reductions,” Comp. J. 10, 188-89. 

LS. Duff and J.K. Reid (1976). “A Comparison of Some Methods for the Solution of 
Sparse Over-Determined Systems of Linear Equations,” J. Inst. Math. Applic, 17, 
267-30. 

P.E. Gill and W. Murray (1976). “The Orthogonal Factorization of a Large Sparse 
Matrix,” in Sparse Matriz Computations, ed. J.R. Bunch and D.J. Rose, Academic 
Press, New York, pp. 177-200. 

L. Kaufman (1979). “Application of Dense Householder Transformations to a Sparse 
Matrix,” ACM Trans. Math. Soft. 5, 442-51. 


Although the computation of the QR factorization is more efficient with Househokier 
reflections, there are some settings where the Givens approach is advantageous. For ex- 
ample, if A is sparse, then the careful application of Givens rotations can minimize fill-in. 


LS. Duff (1974). “Pivot Selection and Row Ordering in Givens Reduction on Sparse 
Matrices,” Computing 13, 239-48. 

J.A. George and M.T. Hesth (1980). “Solution of Sparse Linear Least Squares Problems 
Using Givens Rotations,” Lin. Alg. and Its Applic. 34, 69-83. 


248 CHAPTER 5. ORTHOGONALIZATION AND LEAST SQUARES 


5.4 Other Orthogonal Factorizations 


If A is rank deficient, then the QR factorization need not give a basis for 
ran(A). This problem can be corrected by computing the QR factorization 
of a column-permuted version of A, i.e., AI] = QR where II is a permuta- 
tion. 

The “data” in A can be compressed further if we permit right multipli- 
cation by a general orthogonal matrix Z: 


Q' AZ =T. 


There are interesting choices for Q and Z and these, together with the 
column pivoted QR factorization, are discussed in this section. 


5.4.1 Rank Deficiency: QR with Column Pivoting 


If A € R™*” and rank(A) < n, then the QR factorization does not nec- 
essarily produce an orthonormal basis for ran(A). For example, if A has 
three columns and 


1 1 1 
A = [41, a2, aa} = 18,431,085] | 0 0 1 
0 0 i 


is its QR factorization, then rank(À) = 2 but ran( A) does not equal any of 


the subspaces span{qi, q2}, spanígi : Q3], or span{q2, Q3]- 

Fortunately, the Householder QR factorization procedure (Algorithm 
9.2.1) can be modified in a simple way to produce an orthonormal basis for 
ran(Á). The modified algorithm computes the factorization 


Ru R T 
T = n Hau 
NS | 0 90 | m—T (5.4.1) 


Tr n-r 


where r = rank(A), Q is orthogonal, Ri; is upper triangular and non- 
singular, and II is a permutation. If we have the column partitionings 
AT = [a@¢,,..-,@¢, ] and Q = [qd1,..., 45 |, then for k = 1: we have 


min{r, kc} 
üc, = > Tikqi € span(q,...,Gr} 
imi 
implying 
ran(A) = span(qi,...,qr]- 


The matrices Q and II are products of Householder matrices and inter- 
change matrices respectively. Assume for some k that we have computed 
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Householder matrices Hi,..., Hk- and permutations II;,...,II4, such 
that 


(H,-1--- HJA IE i) = (5.4.2) 
k-1 k- = 
gé-D 2 | RE ) gi 1) | k —1 
0 REO m—k+1 
k—1 n-k+1 


where RED is a nonsingular and upper triangular matrix. Now suppose 


that : 
g-1 -1 E: 


is à column partitioning and let p 7 k be the smallest index such that 
k- k- k— 
| 79 y = max Iz Yas) I} - (5.4.3) 


Note that if k-1 = rank(A), then this maximum is zero and we are finished. 
Otherwise, let II, be the n-by-n identity with columns p and k interchanged 
and determine a Householder matrix Hp such that if R9 = H,RÜU-VT[I,, 
then RO (k t l:m, k) = 0. In other words, II, moves the largest column in 
RE- D to the lead position and Ñ, zeroes all of its subdiagonal components. 

The column norms do not have to be recomputed at each stage if we 
exploit the property 

= [3| st) = le =i -o 

which holds for any orthogonal matrix Q € R’**. This reduces the overhead 
associated with column pivoting from O(mn?) flops to O(mn) flops because 
we can get the new column norms by updating the old column norms, e.g., 


FeO = 2-3 rk 


Combining all of the above we obtain the following algorithm established 
by Businger and Golub (1965): 


Algorithm 5.4.1 (Householder QR With Column Pivoting) Given 
Ae R"*" with m > n, the following algorithm computes r = rank(A) 
and the factorization (5.4.1) with Q = H,---H, and I] = Ih --- Il. The 
upper triangular part of A is overwritten by the upper triangular part of 
R and components 7 + 1:m of the jth Householder vector are stored in 
A(j + 1:m, j). The permutation II is encoded in an integer vector piv. In 
particular, IT; is the identity with rows j and piv(j) interchanged. 
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for j = Ln 
e(j) = A(I:m, 3) A(LEmn, j) 
end 
r=0; r= max(c(1),...,c(n)) 
Find smallest k with 1 < k < nso c(k) — T 
while r > 0 
r=r+l 
piv(r) = k; Á(1:m,r) + A(1:m, k); e(r) e c(k) 
[v, 8] = house( A(r:m, r)) 
A(r:m,rin) = (Im-r41 — BvvT ) A(r:m, rin) 
Alr + m,r) = v(2:m — r + 1) 
fori =r +i:n 


e(i) = c(i) - A(r, i)? 


end 
ifr«n 
= max{e(r + 1),...,c(n)) 

Find smallest k withr+1<k<nsoc(k) =7. 
else 

7=0D 
end 

end 


This algorithm requires 4mnr —2r7(m+n)+4r7/3 flops where r = rank(A). 
As with the nonpivoting procedure, Algorithm 5.2.1, the orthogonal matrix 
Q is stored in factored form in the subdiagonal portion of A. 


Example 5.4.1 If Algorithm 5.4.1 is applied to 


02 3 
1 5 6 

A= |i 8 9}: 
1 1 2 


then II = [e3 e2 e1] and to three significant digits we obtain 


2e E E pee -164 -14600 -—1.820 
: 0.0 816 —.816 


548 — .000 113 -.829 
—T30 408  .200 510 0.0 900 ^ 0.000 


All = QR = 


5.4.2 Complete Orthogonal Decompositions 


The matrix R produced by Algorithm 5.4.1 can be further reduced if it 
is post-multiplied by an appropriate sequence of Householder matrices. In 
particular, we can use Algorithm 5.2.1 to compute 


Rit TT 
a - i1 T 
ape 2i | RI, l 0 | MS (5.4.4) 
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where the Z; are Householder transformations and T7, is upper triangular. 
It then follows that 
T: 0 T 
Taz o p Hu 
NAR SUE | 0 > | m—r . (5.4.5) 
T n-r 


where Z = IIZi--- Z.. We refer to any decomposition of this form as a corn- 
plete orthogonal decomposition. Note that null( A) = ran(Z(1:n,r + 1:n)). 
See P5.2.5 for details about the exploitation of structure in (5.4.4). 


5.4.3  Bidiagonalization 


Suppose A € RU" *" and m > n. We next show how to compute orthogonal 
Ug (m-by-m) and Vg (n-by-n) such that 


d f Q0 0 
0 d fh 0 

ULAVg = | dii du (5.4.6) 
D w 0 da 


Ug = U,---U, and Vg = Vi --- V4.3 can each be determined as a product 
of Householder matrices: 


x X X X X X X X 

X X X X Ü x x x 

x x x x|l.lo x x x) 4% 

x x X X Ü x x x 

X X X X Ü x x x 

x x O DO x x O0 D 

Ü x x x Ü x x x 

0x x xi-5|o00 x x|-& 

Ü x x x 0 0 x x 

Ü x x x 0 0 x x 
x x OO 0 x x O0 O x x O Q0 
o x x Q 0 x x O 0 x x O0 
00x x|[-—|oox x|-2|o 0o x x 
0 0 x x 0 00 x 0 0 0 x 
0 0 x x 09 O 0 x 0 00 0 
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In general, U, introduces zeroa into the kth column, while V, zeros the 
appropriate entries in row k. Overall we have: 


Algorithm 5.4.2 (Householder Bidiagonalization) Given A c R™** 
with m > n, the following algorithm overwrites A with UT AVg = B where 
B is upper bidiagonal and Ug = Uj-..U4, and Vg = V,---Va-2. The 
essential part of U;'s Householder vector is stored in A(j + 1:m, j} and the 
essential part of V;'s Householder vector is stored in A{j, j + 2:n). 


for ; = lin 
[v, 5] = house( A(j:m, j)) 
A(j:m, jin) = (I. 541 — BvvT ) A(j:m, jin) 
A(j + lim, j) = v(2:n — 7 +1) 
if; *n-2 
lw, 8] = house(A(j, j + 1:n)7) 
A(j:m, j + lin) = A(j:m, j + En)(I..; - BvvT) 
A(j, j + 2:n) = v(2:n — j)! 
end 
end 


This algorithm requires 4mn? — 4n?/3 flops. Such a technique is used in 
Golub and Kahan (1965), where bidiagonalization is first described. If the 
matrices Ug and Vg are explicitly desired, then they can be accumulated 
in 4m?n — 4n?/3 and 4n3/3 flops, respectively. The bidiagonalization of A 
is related to the tridiagonalization of AT A. See 88.2.1. 


Exampie 5.4.2 if Algorithm 5.4.2 is applied to 


1 2 3 
4 5 6 
AS] + s gt? 
| 19 i i2 


then to three significant digits we obtain 


He r m : 100 000 O00 
B- . . Ve =| 000 -.667 —.745 


0 Ü o 0.00 —.745 687 


5.4.4 R-Bidiagonalization 


A faster method of bidiagonalizing when m > n results if we upper trian- 
gularize A first before applying Algorithm 5.4.2. In particular, suppose we 
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compute an orthogonal Q € R™*™ such that 
T, | Fi 
ea- |] 
is upper triangular. We then bidiagonalize the square matrix Rı, 


UT RiYa = Bi z 


Here Up and Vg are n-by-n orthogonal and B, is n-by-n upper bidiagonal. 
If Ug = Q diag (Un, Im-n) then 


vay =| Glen 


is à bidiagonalization of A. 

The idea of computing the bidisgonalization in this manner is mentioned 
in Lawson and Hanson (1974, p.119) and more fuily analyzed in Chan 
(19822). We refer to this method as R-bidiagonalization. By comparing its 
flop count (2 mn? 4- 2n?) with that for Algorithm 5.4.2 (4mn? —4n?/3) we see 
that it involves fewer computations (approximately) whenever m 2 5n/3. 


5.4.5 The SVD and its Computation 


Once the bidiagonalization of A has been achieved, the next step in the 
Golub-Reinsch SVD algorithm is to zero the superdiagonal elements in B. 
This is an iterative process and is accomplished by an algorithm due to 
Golub and Kahan (1965). Unfortunately, we must defer our discussion of 
this iteration until §8.6 as it requires an understanding of the symmetric 
eigenvalue problem. Suffice it to say here that it computes orthogonal 
matrices Ur and Vy such that 


UÍ BVg = E = disg(cy...,c4) € R'*^., 


By defining U = UgUy and V = VgVy we see that UTAV = E is the SVD 
of A. The flop counts associated with this portion of the algorithm depend 
upon “how much" of the SVD is required. For example, when solving the 
LS problem, UT need never be explicitly formed but merely applied to b 
as it is developed. In other applications, only the matrix U = U(:,l:n) 
is required. Altogether there are six possibilities and the total amount of 
work required by the SVD algorithm in each case is summarized in the 
table below. Because of the two possible bidiagonalization schemes, there 
are two columns of flop counts. If the bidiagonalization is achieved via 
Algorithm 5.4.2, the Golub-Reinsch (1970) SVD algorithm results, while if 
R-bidiagonalization is invoked we obtain the R-SVD algorithm detailed in 
Chan (1982a). By comparing the entries in this table (which are meant only 
as approximate estimates of work), we conclude that the R-SVD approach 
is more efficient unless m ^: n. 
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4mn? — 4n?/3 Imn? + 2n? 
Amn? + 8n? 2mn? + 115? 


4m?n - 8mn? 4m?n + 13n? 
1dmn? - 2n* mn? + lln? 
4m?n + 8mn? + 9n? | 4m?n + 220? 
limn? + 8n? mn? + 20n? 


Problema 


P65.4.1 Suppose A € R™*" with m < n. Give an algorithm for computing the factor- 
ization 

UTAV =[BO}) 
where B is an m-by-m upper bidiagonal matrix. (Hint: Obtain the form 
x x 00 0 0 
0 x x 0 0 8B 
0 0 x x O0 OD 
0 0 O0 x x 9 
using Househoider matrices and then "chase" the (rn, m + 1) entry up the (m + 1)st 
column by applying Givens rotationa from the right.) 
P5.4.2 Show how to efficiently bidiagonalize an n-by-n upper triangular matrix using 
Giveng rotations. 
P5.4.3 Show how to upper bidiagonalize a tridiagonal matrix T € R**" using Givens 
rotations, 
P5.4.4 Let Ac FE^?" and assume that 0 X v satisfies |] Av lg = on(A)!| vla Let II 
be a permutation such that if IIT e x w, then [t| = || w ilo. Show that if A = QR 
is the QR factorization of AIT, then fan] < \nen(A). Thus, there always exists a 
permutation II such that the QR factorization of AIT “displays” near rank deficiency. 
P5.4.5 Let z,y € R™ and Q € R™*™ be given with Q orthogonal. Show that if 


Q*z = HM 2 Oy = EB sci 
then uT v = zT y — af. 


P5.4.8 Let A = [2a1,...,04] € R” and b € R™ be given. For any subset of A's 
columna (a4,....05,] 


TOR [0e,,...,09,] = MD faen., e ]z - b [lz 


c Ri 


Dencribe an alternative pivot selection procedure for Algorithm 5.4.1 such that if QR = 
All = [ae,,-+., ae, ] in the final factorization, then for k = 1:n: 


res { dej: -+ +, Bep ] = aa retia: , +++ Dena 1s Ac] 
1 


Notes and References for Sec. 5.4 
Aspecta of the complete orthogonal decomposition are discussed in 
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R.J. Hanson and C.L. Lawson (1969). "Extensions and Applications of the Householder 
Algorithm for Solving Linear Least Square Problems,” Math. Comp. 23, T81-812. 

P.A. Wedin (1973). “On tha Almost Rank-Deficient Case of the Least Squares Problem,” 
BIT 12, 344-54. 

G.H. Golub and V. Pereyra (1976). “Differentiation of Pseudo-Inverses, Separabie Non- 
linear Least Squares Problems and Other Tales,” in Generalized Inverses and Appii- 
cations , ed. M.Z. Neshed, Academic Press, New York, pp. 303-24. 


The computation of the SVD is detailed in $8.6. But here are some of the standard 
references concerned with ita calculation: 


G.H. Golub and W., Kahan (1965). “Calculating the Singular Values and Pseudo-Inverse 
of a Matrix," SIAM J. Num. Anal 2, 205-4. 

P.A. Businger and G.H. Golub (1969). "Algorithm 358: Singular Vaiue Decomposition 
of the Complex Matrix," Comm. ACM 12, 564-65. 

G.H. Golub and C. Reinach (1970). "Singular Value Decompasition and Least Squares 
Solutions,” Numer. Math. i4, 403-20. See also Wilkinson and Reinsch(1971, pp. 
1334-51). 

T.F. Chan (1982). “An Improved Algorithm for Computing the Singular Value Decom- 
position,” ACM Trans. Math. Soft. 8, 72-83. 


QR with column pivoting was first discussed in 


P.A. Businger and G.H. Golub (1965). “Linear Least Squares Solutions by Househoider 
Transformations,” Numer. Math. 7, 269-76. Sea also Wilkinson and Reinsch (1971, 
pp. 11-18). 


Knowing when to stop in the algorithm is difficult. In questions of rank deficiency, it is 
helpfui to obtain information about the mmallest singular value of the upper triangular 
matrix H. This can be done using the techniques of 53.5.4 or those that are discussed in 


]. Karasalo (1974). "A Criterion for Truncation of the QR Decomposition Algorithm for 
the Singular Linear Least Squares Problem,” BIT 14, 156-66. 

N. Anderson and I. Karasalo (1975). "On Computing Bounds for the Least Singular 
Value of a Triangular Matrix,” BIT 15, 1-4. 


Other aspects of rank éstimation with QR are discussed in 


L.V. Foster (1986). “Rank and Null Space Calculations Using Matrix Decomposition 
without Column Interchanges,” Lin. Alg. and its Applic. 74, 4T- TI. 

T.F. Chan (1987). “Rank Revealing QR Factorizations," Lin. Alig. and its Applic. 
88/89, 67-82. 

T.F. Chan and P. Hansen (1992). “Some Applications of the Rank Revealing QR Fac- 
torization,” SIAM J. Sci. and Stat. Comp. 13, 727-741. 

J.L. Barlow and U.B. Vemulapati (1992). “Rank Detection Methods for Sparse Matri- 
cea,” SIAM J. Matriz. Anal Appt 13, 1279-1297. 

T-M. Hwang, W-W. Lin, and E.K. Yang (1992). “Rank-Reveeling LU Factorizations,” 
Lin. Alg. and its Applic. 175, 115—141. 

C.H. Bischof and P.C. Hansen (1992). “A Block Algorithm for Computing Rank- 
Revealing QR. Factorizations,” Numerical Algorithms 2, 371-302. 

S. Chandrasekaren and 1.C.F. Ipsen (1994). "On Rank-Revealing Factorizations,” SIAM 
J. Matriz Anal, Appl 15, 592-622. 

R.D. Fierro and P.C. Hansen (1995). "Accuracy of TSVD Solutions Computed from 
Rank-Reveeling Decompositions,” Numer. Math. 70, 453-472. 
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5.5 The Rank Deficient LS Problem 


If A is rank deficient, then there are an infinite number of solutions to the 
LS problem and we must resort to special techniques. These techniques 
must address the difficult problem of numerical rank determination. 

After some SVD preliminaries, we show how QR with column pivoting 
can be used to determine a minimizer zg with the property that Arg is a 
linear combination of r = rank(A) columna. We then discuss the minimum 
2-norm solution that can be obtained from the SVD. 


5.5.1 The Minimum Norm Solution 


Suppose A € IR"*^ and rank(A) =r < n. The rank deficient LS problem 
has an infinite number of solutions, for if y is a minimizer and z € null(A) 
then r + z is also a minimizer. The set of all minimizers 
X = {z € R” : || Ar — b fle = min } 
is convex, for if zj, r2 € Æ and A € [0, 1], then 
| A(Ax1 --(1—3)z2) -bll S Al Ax, — Ol], + (137 4)l| Azz — b lia 
= minj| Ar — b ila. 

Thus, Ar; + (1 — A)zs € X. It follows that 7 has a unique element having 
minimum 2-norm and we denote this solution by rrs. (Note that in the 


full rank case, there is only one LS solution and so it must have minimal 
2-norm. Thus, we are consistent with the notation in $5.3.) 


5.5.2 Complete Orthogonai Factorization and xis 


Any complete orthogonal factorization can be used to compute rgs. In 
particular, if Q and Z are orthogonal matrices such that 


0 
T n-r 


T _ T Tu Ü T 
EE 0 | aer penn 


then 
| Az — bl} =| (QTAZ)ZT re - QT} = ||Tuw -cel + | a ll2 


where 
T. _ t T Ti _ c T 
furum Hire ieee B m-r 


Clearly, if z is to minimize the sum of squares, then we must have w = T,,‘c. 
For x to have minimal 2-norm, y must be zero, and thus, 


i 
aug = Z| TS). 
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5.5.3 The SVD and the LS Problem 


Of course, the SVD is & particularly revealing compiete orthogonal de- 
composition. It provides a neat expression for rps and the norm of the 
minimum residual prs = || Aris — b Ila- 


Theorem 5.5.1 Suppose UT AV = E is the SVD of Ac R™" with r = 
rank(A). IJU =([wuy,...,tm] and V = [m,..., Un ] are column partition- 
ings and b e IR™, then 


TLS = — y; (5.5.1) 
imi 
minimizes || Ar — b || and has the smallest 2-norm of all minimizers. More- 


over T 


Pls = || Aris -bi = Y^ (ub). (5.5.2) 


imr-41 
Proof. For any z € IR" we have: 


| (U^ AV)(V7z) - UTZ = [Ea - UTNE 
9 (ma ub) € ^ (urb) 


{=l i=r+1 


|| Ax — 513 


where a = VT z. Clearly, if z solves the LS problem, then a; = (uf b/a;) for 
i = Lr. f we set a(r + l:n) = 0, then the resulting z clearly has minimal 
2-norm. O 


5.5.4 The Pseudo-Inverse 
Note that if we define the matrix A+ c R'*'" by At = VE+UT where 


mt diag ( 7..... 20....0) € E" rz rank(A) 
i Orp 
then Tzs = Atb and prs = || {I — AA*)bi|a. A*t is referred to as the 
pseudo-inverse of A. It is the unique minimal Frobenius norm solution to 
the problem 


: a om WAX = Ln le - (5.5.3) 


If rank(A) = n, then At = (AT A)-! AT, while if m = n = rank(A), then 
At = A^!, Typically, At is defined to be the unique matrix X c R"*™ 
that satisfies the four Moore-Penrose conditions: 


(i) AXA = (ii) (Ax)? 


A AX 
(ii) XAX = X (iv) (XAF 


X A. 
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These conditions amount to the requirement that AA* and At A be orthog- 
onal projections onto ran(A) and ran(A7), respectively. Indeed, AAt = 
UUT where U, = U(1:m, Lr) and AtA = Vi VE where V; = V(I:n, Lr). 


5.5.5 Some Sensitivity Issues 


In §5.3 we examined the sensitivity of the full rank LS problem. The be- 
havior of zz in this situation is summarized in Theorem 5.3.1. If we drop 
the full rank assumptions then zrs is not even a continuous function of the 
date and small changes in A and b can induce arbitrarily large changes in 
tig = Atb. The easiest way to see this is to consider the behavior of the 
pseudo inverse. If A and 6À are in R™””, then Wedin (1973) and Stewart 
(1975) show that 


| (A 6A)* — A* le S 20 6A || pmax (1 A+ I3 , I| (A+ 64)* I }- 


This inequality is à generalization of Theorem 2.3.4 in which perturbations 
in the matrix inverse are bounded. However, unlike the square nonsingular 
case, the upper bound does not necessarily tend to zero as 6.A tends to zero. 


then 


0 


2: El and (A+6a)* =| i p 3 


l le 0 


and || At —(4+6A)* ||3 = 1/e. The numerical determination of an LS 
minimizer in the presence of such discontinuities is a major challenge. 


5.5.6 QR with Column Pivoting and Basic Solutions 


Suppose A € IR™*" has rank r. QR with column pivoting (Algorithm 5.4.1) 
produces the factorization AJ] = QR where 


|. | Ru Am r 
R = E X» 


T n-—-T 


Given this reduction, the LS problem can be readily solved. Indeed, for 
any rc R" we have 


| Az — 5| l| (Q7 Anz) - (QB) 112 


| Ruy — (e— Riaz) |f + | 2 {2 , 


il 
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where 


Po y r T _ c r 
icm HES ind rgo H m-r 


Thus, if r is an LS minimizer, then we must have 


Ry (c — Riaz) | l 


;-nu| 
z 


If z is set to zero in this expression, then we obtain the basic solution 
-1 
zp -u| Bue | ; 


Notice that zg has at most r nonzero components and so Arg involves a 
subset of .A’s columns. 

The basic solution is not the minimal 2-norm solution unless the sub- 
matrix ff), i8 zero since 


| zs la = (5.5.4) 


—Iney 


tg -Il | Ry Riz | z 


min 
ze RT 2 


Indeed, this characterization of || zzs ||, can be used to show 


1 < Eh < i+] R Ral. (5.5.5) 
zis fl 


See Golub and Pereyra (1976) for details. 


5.5.7 Numerical Rank Determination with AII = QR 


If Algorithm 5.4.1 is used to compute zp, then care must be exercised in 
the determination of rank( A). In order to appreciate the difficulty of this, 
suppose 


VP 
RY RY k 
0 Re) m-k 
k n-k 
is the matrix computed after k steps of the algorithm have been executed 
in floating point. Suppose rank(A) = k. Because of roundoff error, Ri! 
will not be exactly zero. However, if Ë) is suitably small in norm then it 


is reasonable to terminate the reduction and declare A to have rank k. A 
typical termination criteria might be 


| A! l2 < e A (5.5.6) 


fI, --- mAN Iu) = R® = | 
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for some smali machine-dependent parameter ¢,. In view of the roundoff 
properties associated with Householder matrix computation (cf. 85.1.12), 
we know that A‘*) is the exact R factor of a matrix A + Ey, where 


h Ex le Sel Ala «2 = Ofu). 
Using Theorem 2.5.2 we have 
akyl A + Ee) = eji ( R99). < i| RO ia. 
Since o%41(A) S akyl A + Ex) + l| Ex |l2, it follows that 
Gka1(À) S (e + ea A Ila. 


In other words, a relative perturbation of O(e; + €2) in A can yield a rank-k 
matrix. With this termination criterion, we conclude that QR with column 
pivoting "discovers" rank degeneracy if in the course of the reduction RO? 
is small for some k < n. 

Unfortunately, this is not always the case. À matrix can be nearly rank 
deficient without a single RY being particularly small. Thus, QR with 
column pivoting by itself is not entirely reliable as a method for detecting 
near rank deficiency. However, if à good condition estimator is applied to 
R it is practicaily impossible for near rank deficiency to go unnoticed. 


Example 5.5.1 Let T4(c) be the matrix 


l -c -c —-C 

Ò lL -c -e 
Ta(e) = diag(1,s,...,3771) 

: ] -c 

0 n 1 


with c? 4- s? = 1 with c, s > 0 (See Lawson and Hanson (1974, p.31).) These matrices are 
unaltered by Algorithm 5.4.1 and thus || RY lla > 3^7! fork  i:n— 1. This inequality 
implies (for example) thet the matrix Tiog(.2} has no particularly small trailing principal 
submatrix since s*? = 13. However, it can be shown that on = O(1074). 


5.5.8 Numerical Rank and the SVD 


We now focus our attention on the ability of the SVD to handle rank- 
deficiency in the presence of roundoff. Recall that if A = UEVT is the 
SVD of A, then 


— ulb 
TLS = 2. uL (5.5.7) 


where r — rank(A). Denote the computed versions of U, V, and E — 
diag(c;) by U, V, and Ê = diag(&,). Assume that both sequences oi singular 
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values range from largest to smallest. For a reasonably implemented SVD 
algorithm it can be shown that 


T=W+AU | WTW-zI4, AU le<e (5.5.8) 
V-Z«-AV ZZ=I ||AV |le<e (5.5.9) 
S=awt(A+AA)Z [AA € ell Alla (5.5.10) 


where ¢ is a smal] multiple of u, the machine precision. In plain English, the 
SVD algorithm computes the singular values of a “nearby” matrix A+ AA. 

Note that Ü and V are not necessarily close to their exact counterparts. 
However, we can show that + is close to oy. Using (5.5.10) and Theorem 
2.5.2 we have 


It 


mn  [A-Bi 
rank( B)—k—1 


Tk 


min (È - B) ~ WT(AA)Z lo. 
rank( B)—k— 1 


Since || WT(AA)Z |a € ell Alla =o. and 


min ||}, -— Bla = 8 
rank( Bj=k—1 


it follows that |cy, — k| € ec; for k = 1:n. Thus, if A has rank r then we 
can expect n — r of the computed singular values to be small. Near rank 
deficiency in A cannot escape detection when the SVD of A is computed. 


Example 5.5.2 For the matrix Ti90(.2) in Example 5.5.1, on 5s .367 - 107%. 

One approach to estimating r = rank(A) from the computed singular 
values is to have a tolérance 6 > 0 and 8 convention that A has “numerical 
rank" r if the cj satisfy 


Oy 2s 0,20260,1127 2 Gy 


The tolerance 6 should be consistent with the machine precision, e.g. 6 = 
u|| A foo. However, if the general level of relative error in the data is iid 
than u, then 6 should be correspondingly bigger, e.g., 6 = 10^?|| A [os if 
the entries in A are correct to two digits. 

If ? is accepted as the numerical rank then we can regard 
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as an approximation to Trs. Since || xs || = 1/o¢ < 1/5 then ó may also 
be chosen with the intention of producing an approximate LS solution with 
suitably small norm. In §12.1, we discuss more sophisticated methods for 
doing this. 

If 6; > ô, then we have reason to be comfortable with ra because A 
can then be unambiguously regarded as a rank(A5) matrix (modulo 6). 

On the other hand, ([21,...,04,) might not clearly split into subsets 
of small and large singular values, making the determination of f by this 
means somewhat arbitrary. This leads to more complicated methods for 
estimating rank which we now discuss in the context of the LS problem. 

For example, suppose r = n, and assume for the moment that AA = 0 
in (5.5.10). Thus v; = à, for i = lin. Denote the ith columns of the 
matrices Ü, W, V, and Z by uj, wi, %, and zi, respectively. Subtracting 
te from rpg and taking norms we obtain 


5o leda bu , | S (2y. 


ixl et iml 


lla, -= Trs ll < 


From (5.5.8) and (5.5.9) it is easy to verify that 


{| (w7 bz; —(ulbvila € 2(1 + eel b |l (5.5.11) 
and therefore 
x n 2 
f Th 
lze- zrs lle S S20 + e)etbia + | 7 (= ) | 
c ra Ci 
t=r+l 


The parameter ? can be determined as that integer which minimizes the 
upper bound. Notice that the first term in the bound increases with 7, 
while the second decreases. 

On occasions when minimizing the residual is more important than ac- 
curacy in the solution, we can determine ? on the basis of how close we 
surmise || 6 — Az, ||; is to the true minimum. Paralleling the above x 
sis, it can be shown that 


|è- Aza lla - b- Azis lla € (n-A bila tell blo GEED) 


Again f could be chosen to minimize the upper bound. See Varah (1973) 
for practical details and also the LAPACK manual. 


5.5.9 Some Comparisons 


As we mentioned, when solving the LS problem via the SVD, only £ and 
V have to be computed. The following table compares the efficiency of this 
approach with the other algorithms that we have presented. 
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Normal Equations mn? +n7/3 
Householder Orthogonalization | 2mn? — 2n?/3 
Modified Gram Schmidt 2mn? 


Givens Orthogonalization 3mn? — n? 
Householder Bidiagonalization | 4mn? — 4n?/2 
R-Bidiagonalization 2mn? + 2n? 
Golub-Reinsch SVD 4mn? + 8n? 
R-SVD 2mn? + lin? 


Problema 
P5.5.1 Show that if 


0 Q0 
T n—r 
where r = rank(A) and T is nonsinguiar, then 


P [^ 5] r 


Te QN 


0 0 n-—r 


r m-r 


satisfies AX A = A and (AX)? = (AX). In this case, we say that X is a (1,3) pseudo- 
inverse of A. Show that for general A, zp = Xb where X is a (1,3) pseudc-inverse of A. 


P5.5.2 Define B(A} € R**'" by BOA) = (AT A + AI)! AT, where A > 0. Show 
À 
B(A) =- At! = ——————— - 
l| BCA) — AT Ila mA (AY + X r= rank(A) 
and therefore that B(A) — At as A — 6. 
P5.5.3 Consider the rank deficient LS problem 


æ ilo ojis]- 05] 
ER” o Q0 z d 
zER™T 
where RE R *", Sc R**-" y € RY, and z € R?^7*. Assume that R is upper triangu- 
lar and nonsingular. Show how to obtain the minimum norm solution to this problem 
by computing an appropriate QE. factorization without pivoting and then solving for the 
appropriate y and 2. 
P5.5.4 Show that if A, — A and At — At, then there exists an integer ko such that 
rank(A,;.) is constant for all k > kp. 
P5.5.5 Show that if A c A™*" has rank n, then so does A+ E if we have the inequality 
l E fal] A* lz « 1. 


2 


Notes and References for Sec. 5.5 
The pseudc-inverse literature ia vast, as evidenced by the 1,775 references in 
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M.Z. Nasbed (1976). Cenernlized Inverses and Applications, Academic Press, New York. 
The differentiation of the peeudo-inverse is further discussed in 


C.L. Lawson and R.J. Hanson (1969). “Extensions and Applications of the Housebolder 
Algorithm for Solving Linear Least Squares Problems,” Math, Comp. £3, 787—812. 

G.H. Golub and V. Pereyra (1973). “The Differentiation of Pseudo-Inverses and Nonlin- 
ear Least Squares Problema Whose Variables Separate,” SIAM J. Num. Anal 10, 
413-32. 


Survey treatments of LS perturbation theory may be found in Lawson and Hanson 
(1974), Stewart and Sun (1991), Björck (1996), and 


P.A. Wedin (1973). “Perturbation Theory for Pseudo-Inverses," BIT 13, 217-32. 

G.W. Stewart (1977). “On the Perturbation of Pseudo-Inverses, Projections, and Linear 
Least Squares,” SIAM Review 19, 634-62. 

Even for full rank problems, column pivoting seems to produce more accurate solutions. 

The error analysis in the following paper attempts to explain why. 

L.S. Jennings and M.R. Osborne (1974). “A Direct Error Analysis for Least Squares,” 
Numer. Math. 22, 322-32. 

Various other aspects rank deficiency are discussed in 

J.M. Varah (1973). “On the Numerical Solution of III-Conditioned Linear Systems with 
Applications to Ill-Posed Problems,” SIAM J. Num. Anal. 10, 257-67. 

G.W. Stewart (1984). “Rank Degeneracy,” SIAM J. Sei. and Stat. Comp. 5, 403-413. 

P.C. Hansen (1987). “The Truncated SVD as a Method for Regularization,” BIT 27, 
534-553. 

G.W. Stewart (1987). “Collinearity and Least Squares Regression,” Statistical Science 
2, 68-100. 


We have more to say on the subject in 312.1 and §12.2. 


5.0 Weighting and Iterative Improvement 
The concepts of scaling and iterative improvement were introduced in the 


Chapter 3 context of square linear systems. Generalizations of these ideas 
that are applicable to the least squares problem are now offered. 


5.6.1 Column Weighting 
Suppose G € IR**" is nonsingular. A solution to the LS problem 


min || Az — 5 ||; AcR""^ beg (5.6.1) 
can be obtained by finding the minimum 2-norm solution yrs to 
min || (AG)y — 51]; (5.6.2) 


and then setting zz; = Gyzs. If rank(A) = n, then zz; = Trg. Otherwise, 
tq is the minimum G-norm solution to (5.6.1), where the G-norm is defined 
by zl = ll G^ lla 
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The choice of G is important. Sometimes its selection can be based on 
4 priori knowledge of the uncertainties in A. On other occasions, it may be 
desirable to normalize the columns of A by setting 


G = Go = diag(1/|| A1) lla... .,1/] AC, n) la). 


Van der Sluis (1969) has shown that with this choice, xa( AG) is approxi- 
mately minimized. Since the computed accuracy of yrs depends on &5( AG}, 
a case can be made for setting G = Gp. 

We remark that column weighting affects singular values. Consequently, 
a scheme for determining numerical rank may not return the same estimates 
when applied to A and AG. See Stewart (1984b). 


5.6.2 Row Weighting 


Let D = diag(d,...,d) be nonsingular and consider the weighted least 
squares problem 


minimize | D(Az - b) |; Ae R™*", be R7. (5.6.3) 


Assume rank(A) = n and that zp solves (5.6.3). It follows that the solution 
zps to (5.6.1) satisfies 


tp- zis = (ATD A) AT (D? — D(b — Azzs). (5.6.4) 


This shows that row weighting in the LS problem affects the solution. (An 
important exception occurs when b € ran(A) for then rp = zrg.) 

One way of determining D is to let dẹ be some measure of the un- 
certainty in bk, e.g., the reciprocal of the standard deviation in b. The 
tendency is for ry = ef (b — Azp) to be small whenever dy is large. The 
precise effect of dy on rą can be clarified as follows. Define 


D(5) - diag(di,..., dy. 1, dk v1 +4 ded... dm) 


where ó > —1. If z(6) minimizes || D(6)( Ax — b) ||; and r_(6) is the k-th 
. component of b — Az(5), then it can be shown that 
Tk 


= =e F airan n ar. 6.5 
ne) 1+ dje; A(AT D? A) - 1 AT ey (5.6.5) 
This explicit expression shows that r;,(6) is a monotone decreasing function 
of 6. Of course, how rẹ changes when all the weights are varied is much 
more complicated. 


Example 5.0.1 Suppose 
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H D = I then gp = [-1, .85]T and r = b- Azp = [.3, —4, —1, .2]7. On 
the other hand, if D = diag( 1000, 1, 1, 1) then we have zp t3 [—1.43, 1.21]? and 
r-b— Arp —[.000428 —.571428 — .142853 285714 |T. 


5.0.3 Generalized Least Squares 


In many estimation problems, the vector of observations b is related to x 
through the equation 
b= Ar+w (5.6.6) 


where the noise vector w has zero mean and a symmetric positive defi- 
nite vartance-covariance matrix o7W. Assume that W is known and that 
W = BBT for some B c R™*™. The matrix B might be given or it might 
be W's Cholesky triangle. In order that all the equations in (5.6.6) con- 
tribute equally to the determination of z, statisticians frequently solve the 
LS problem 
min{| B! (Az — è) li. (5.6.7) 

An obvious computational approach to this problem is to form Á- B-!A 
and b = B-!5 and then apply any of our previous techniques to minimize 
|| Az — b |lo. Unfortunately, z wil be poorly determined by such a proce- 
dure if B is ill-conditioned. 

À much more stable way of solving (5.6.7) using orthogonal transforma- 
tions has been suggested by Paige (1979a, 1979b). It is based on the idea 
that (5.6.7) is equivalent to the generalized least squares problem, 


min vv. (5.6.8) 
bu Az+ Bv 


Notice that this problem is defined even if A and B are rank deficient. 

Although Paige's technique can be applied when this is the case, we shall 

describe it under the assumption that both these matrices have full rank. 
The first step is to compute the QR factorization of A: 


ara = | | Q-[Q Q] 
n m-n 
An orthogonal matrix Z € R™*™ is then determined so that 


QjBZ -[0 S] 2=(% ZZ] 
n m-—n Tt m-n 


where S is upper triangular. With the use of these orthogonal matrices the 
constraint in (5.6.3) transforms to 


Qib 0 0 S Zev 
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Notice that the “bottom half” of this equation determines v, 
SuzQib u=Zyu, (5.6.9) 
while the “top half” prescribes z: 
Riz = QTb— (QT BZ ZT + QT BZZ] w = QTb- QT BZ. (5.6.10) 
The attractiveness of this method is that all potential ill-conditioning is 
concentrated in triangular systems (5.6.9) and (5.6.10). Moreover, Paige 


(1979b) has shown that the above procedure is numerically stable, some- 
thing that is not true of any method that explicitly forms Bo" A. 


5.6.4 Iterative Improvement 


À technique for refining an approximate LS solution has been analyzed by 
Bjorck (1967, 1968). It is based on the idea that if 


F. EIE " H AcR?*",beR" (5.6.11) 


then || b — Az {2 = min. This follows because r+ Az = b and ATr = 0 imply 
AT Az = ATb. The above augmented system is nonsingular if rank(A) = 
n, which we hereafter assume. 

By casting the LS probiem in the form of a square linear system, the 
iterative improvement scheme (3.5.5) can be applied: 


r() — 09; ei =0 
for k — 0,1, 


[oo] [e] D 80S] 

at 0] [2m | = | So | 

[2] = ETHER 
end 


The residuals f{*) and g(*) must be computed in higher precision and an 
original copy of A must be around for this purpose. 
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If the QR factorization of A is available, then the solution of the aug- 
mented system is readily obtained. In particular, if A = QR and A; = 
R(1:n, 1:n), then a system of the form 


Le allal = Ls] 


transforms to 


where 


T; |. {is n T. h n 

are |. vp de 
Thus, p and z can be determined by solving the triangular systems RT À = g 
and Riz = f, — h and setting p= Q| |. Assuming that Q is stored in 


factored form, each iteration requires 8mn — 2n? flops. 

The key to the iteration’s success is that both the LS residual and so- 
lution are updated—not just the solution. Bjorck (1968) shows that if 
&2(A) = 6% and t-digit, -base arithmetic is used, then z“*) has appraxi- 
mately k(t — g) correct base 8 digits, provided the residuals are computed 
in double precision. Notice that it is <9(A), not «3(A)^, that appears in 
this heuristic. 

Problems 


P5.6.1 Verify (5.6.4). 
P5.6.2 Let A c K"*" have fuil rank and define the diagonal matrix 
A = diag( 1,...,1,(1+6),1,...,1) 
—— ee! 
k-1 m—k 


for 6 > —1. Denote the LS solution to min || A(Az — b) ||; by z(^) and its residual by 
r(ó) = b— Az(5). (a) Show 


(6) = (: = ACAT A)~" ATe, eT Jro. 


1 + def A(ATA)— AT ey 
(b) Letting r,(5) stand for the kth component of r(5), show 
ra (0) 
Teló) = ——— a; 
(ð) 1+ de; A(AT A)- 131 AT 2, 


(c) Use (b) to verify (5.6.5). 
P5.8.3 Show how the SVD can be used to solve the generalized LS problem when the 
matrices A and B in (5.6.8) are rank deficient. 
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P5.8.4 Let A c R™** have rank n and for a > 0 define 


ail, A 
mie) = | MT 


Show that 


Om4n(M{a)) = min fa. -2 + fonta + gi 


and determine the value of a that minimizes &3( M (a)). 
P5.0.5 Another iterative improvement method for LS problems is the following: 


z — 0 

for k = 0,1,... 
rík) =b- Azi!) (double precision) 
l Az — £O) lla = min 
gt) ur 4 ur) 

end 


(a) Assuming thet the QR factorization of A is available, how many fopa per iteration 
are required? (b) Show that the above iteration results by setting gi") = 0 in the itere- 
tive improvement scheme given in 55.6.4. 


Notes and References for Sec. 5.6 


Row and column weighting in the LS probiern is discussed in Lawson and Hanson (SLS, 
pp. 180-88). The various effects of scaling are discussed in 


A. van der Siuis (1969). “Condition Numbers and Equilibration of Matrices," Numer. 
Math. 14, 14-23. 

G.W. Stewart (1984b). “On the Asymptotic Behavior of Scaled Singular Value and QR 
Decompoaitions," Math Comp. 4S, 483-490. 


The theoretical and computational aspects of the generalized least squares problem ap- 
pear in 


S. Kourouklis and C.C. Paige (1981). "A Constrained Least Squares Approach to the 
General Gaum-Markov Linear Modal," J. Amer. Stat. Assoc. 76, 820-25. 

C.C. Paige (1979a). “Computer Soiution and Perturbation Analysis of Generalized Least 
Squares Problems,” Math Comp. 33, 171-84. 

C.C. Paige (1979b). “Fast Numerically Stable Computations for Generalized Linear 
Least Squares Problecos,” STAM J. Num. Anal 106, 165-71. 

C.C. Paige (1985). “The General Limit Model and the Generalized Singular Value 
Decomposition,” Lin. Alg. and Ite Applic. 70, 269-284. 


Iterative improvement in the least squares context is discussed in 


G.H. Golub and J.H. Wilkinson (1966). “Note on Iterative Refinement of Least Squares 
Solutions," Numer. Math 9, 139—48. 

A, Björck and G.H. Golub (1967). “Iterative Refinement of Linear Least Squares Solu- 
tions by Householder Transformation,” BIT 7, 322-37. 

A. Björck (1967). “Iterative Refinement of Linear Least Squares Solutions L" BIT 7, 
257-78, 

A. Björck (1968). “Iterative Refinement of Linear Least Squares Solutions IL" BIT 8, 
B-30. 

A, Björck (1987). "Stability Analysis of the Method of Seminormal Equations for Linear 
Least Squares Problems," Linear Aig. and Iis Applic. 58/89, 31-48. 
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5.7 Square and Underdetermined Systems 


The orthogonalization methods developed in this chapter can be applied to 
square systems and also to systems in which there are fewer equations than 
unknowns. In this brief section we discuss some of the various possibilities. 


5.7.1 Using QR and SVD to Solve Square Systems 


The least squares solvers based on the QR factorization and the SVD can 
be used to solve square linear systems: just set m = n. However, from 
the flop point of view, Gaussian elimination is the cheapest way to solve 
a square linear system as shown in the following table which assumes that 
the right hand side is available at the time of factorization: 


Gaussian Elimination 
Householder Orthogonalization 


Modified Gram-Schmidt 
Bidiagonalization 
Singular Value Decomposition 


Nevertheless, there are three reasons why orthogonalization methods might 
be considered: 


e The flop counts tend to exaggerate the Gaussian elimination advan- 
tage. When memory traffic and vectorization overheads are consid- 
ered, the QR approach is comparable in efficiency. 


> The orthogonalization methods have guaranteed stability; there is no 
“growth factor” to worry about as in Gaussian elimination. 


e In cases of ill-conditioning, the orthogonal methods give an added 
measure of reliability. QR with condition estimation is very depend- 
able and, of course, SVD is unsurpassed when it comes to producing 
a meaningful solution to a nearly singular system. 


We are not expressing a strong preference for orthogonalization methods 
but merely suggesting viable alternatives to Gaussian elimination. 

We also mention that the SVD entry in Table 5.7.1 assumes the avail- 
ability of b at the time of decomposition. Otherwise, 20n? flops are required 
because it then becomes necessary to accumulate the U matrix. 

If the QR factorization is used to solve Ar = b, then we ordinarily 
have to carry out a back substitution: Re = QTb. However, this can be 
avoided by “preprocessing” b. Suppose H is a Householder matrix such 
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that Hb = Ben where e, is the last column of In. If we compute the QR 
factorization of (H.A)7, then A = HT RT QT and the system transforms to 


Ry = Be, 


where y = Q7 z. Since RT is lower triangular, y = (8/ran)én and so 


T= Ê Qi, n). 


Tnn 


5.7.2 Underdetermined Systems 
We say that a linear system 


Azzb AeR™” bem" (5.7.1) 


is underdetermined whenever m <n. Notice that such a system either has 
no solution or has an infinity of solutions. In the second case, it is important 
to distinguish between algorithms that find the minimum 2-norm solution 
and those that do not necessarily do so. The first algorithm we present is 
in the latter category. Assume that A has full row rank and that we apply 
QR with column pivoting to obtain: 


QT AI = (Ri R4] 


where R, € IR™*™ is upper triangular and R4 € IR?* (79. Thus, Az = b 
transforms to 


(QTAI(ITz) = [R Fa} | = Qi 


Ir = | 7 | 
2 
with zı c E" and z € R(^-7). By virtue of the column pivoting, R, is 


nonsingular because we are assuming that A has fuil row rank. One solution 
to the problem is therefore obtained by setting zı = RI '!QTb and z; = 0. 


where 


Algorithm 5.7.1 Given A € R™*" with rank(A) = m and b € R™, the 
following algorithm finds an x € R” such that Az = b. 


QT AIL =R (QR with column pivoting.) 
Solve R(1:m, l:m)z, = QT b. 


“2 zl 
set z= | D | 
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This algorithm requires 2m?n — m?/3 flops. The minimum norm solution 
is not guaranteed. (A different II would render a smaller z,.) However, if 
we compute the QR factorization 


cet] 
with A, € R™*™, then Ar = b becomes 
z 
(QR)"z = [ RT ofz] =i 
where 
Q's = | ^ | z2 € R”, 22€ RTT, 
z3 


Now the minimum norm solution does follow by setting z; = D. 


Algorithm 5.7.2 Given A € R™*" with rank(A) = m and 6 € R”, the 
following algorithm finds the minimal 2-norm solution to Az = b. 


AT - QR (QR factorization) 
Solve R(1:m, 1:mn)? z = b. 
r= Q(,lm)z 


This algorithm requires at most 2m?n — 2m? /3 
The SVD can also be used to compute the minimal norm solution of an 
underdetermined Ar = b problem. If 


A= S suu]  r=rank(A) 


iml 


is A's singular value expansion, then 


As in the least squares problem, the SVD approach is desirable whenever 
A is nearly rank deficient. 


5.7.3 Perturbed Underdetermined Systems 


We conclude this section with a perturbation result for full-rank underde- 
termined systems. 


5.7. SQUARE AND UNDERDETERMINED SYSTEMS 273 


Theorem 5.7.1 Suppose rank(A) = m € n and that A € R™" bA c R™™", 
0Zbe IR", and ób € R” satisfy 


€ = max{es,es} « os ( A), 


where €4 = || SA [2/1 A || and e, = |f ôb ||3/|| b||g. Jf z andè are minimum 
norm solutions that satisfy 


Arz-b (A+ dA)2 = b+ 5b 
then 


1c (A)(camin(2,n —m + 1} +4) +O(2). 


tlle 7 


Proof. Let E and f be defined by 6A/e and ób/e. Note that rank(A + tE) = 
m for all 0 < £ < « and that 


x(t) = (A+tE)? ((A c tEY(A -£E)7) . (b - tf) 
satisfies (A + tE)r(t) = b+ tf. By differentiating this expression with 
respect to £ and setting t = 0 in the result we obtain 
#(0) = (I — AT(AAT)-14) ET(AAT)-1b + AT(AAT)-1(f — Ex). 
Since 
Hæll = {| AT(AAT)-!b lla > ex (A)]] (AAT) 15 Ha, 
| 2 — AT(AAT)-1A4| = min(1,n — m), 


and 
Wile X Lll All 
lzi ^ z Ibl ` 
we have 
lŝ-zle _ z()-20) 2h oa 
it~ Wah k * Of? 
HEll Ifl , WF lle 
: tmin(l,a m) (ge + Tela * Lak) n0 pue) 


from which the theorem follows. O 


Note that there is no (A) factor as in the case of overdetermined systems. 


Problema 


P5.7.1 Derive the above expression for 7/0). 

PS.7.2 Find the minimal norm solution to the system Ar = b where A = [123] and 
b=1. 

P5.7.3 Show how triangular system solving can be avoided when using the QR factor- 
ization to solve ap underdetermined system. 

P5.7.4 Suppose b, x € R” are given. Consider the following problema: 
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(a) Find an unsymmetric Toeplits matrix T # Tz = b. 

(b) Find a symmetric Toeplits matrix T so Tz = b. 

(c) Find a circulant matrix C so Cx = b. 
Pose each problem in the form Ap = b where A iz a matrix made up of entries from z 
and p is the vector of sought-after parameters. 


Notes and References for Sec. 5.7 
Interesting aspects concerning singular systems are discussed in 


T.F. Chan (1984). “Deflated Decomposition Solutions of Nearly Singular Systema," 
SIAM J. Num. Anal. 21, T38-T54. 

G.H. Golub and C.D. Meyer (1986). “Using the QR Factorization and Group Inversion 
to Compute, Differentiate, and estimate the Sensitivity of Stationary Probabilities 
for Markov Chains,” SIAM J. Alg. and Dis. Methods, 7, 273-281. 


Papers on underdetermined systema include 


R.E. Cline and R.J. Plemmons (1976). "L-Solutions to Underdetermined Linear Sys- 
tema," SIAM Review 18, 92-106. 

M. Arioli and A. Laratta (1985). “Error Analysis of an Algorithm for Solving an Under- 
determined System," Numer. Math. 46, 255—268. 

J-W. Demmel and N.J. Higham (1993). “Improved Error Bounds for Underdetermined 
System Solvers,” SIAM J. Matriz Anal. Appl 14, 1-14. 


The QR factorization can of course be used to solve linear systems. Sea 


N.J. Higham (1991). “Iterative Refinement Enhances the Stability of QR Factorization 
Methods for Solving Linear Equations,” SIT 31, 447—468. 


Chapter 6 


Parallel Matrix 
Computations 


$6.1 Basic Concepts 
$6.2 Matrix Multiplication 
96.3 Factorizations 


The paraliel matrix computation area has been the focus of intense 
research. Although much of the work is machine/system dependent, a 
number of basic strategies have emerged. Our aim is to present these along 
with a picture of what it is like to “think parallel" during the design of a 
matrix computation. 

The distributed and shared memory paradigms are considered. We use 
matrix-vector multiplication to introduce the notion of a node program in 
86.1. Load balancing, speed-up, and synchronization are also discussed. 
In §6.2 matrix-matrix multiplication is used to show the effect of blocking 
on granularity and to convey the spirit of two-dimensional data flow. Two 
parallel implementations of the Cholesky factorization are given in §6.3. 


Before You Begin 


Chapter I, §4.1, and 54.2 are assumed. Within this chapter there are 
the following dependencies: 


$61 — 862 — 563 


Complementary references include the books by Schónauer (1987), Hock- 
ney and Jesshope (1988), Modi (1988), Ortega (1988), Dongarra, Duf, 


Ore 
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Sorensen, and van der Vorst (1991), and Golub and Ortega (1993) and the 
excellent review papers by Heller (1978), Ortega and Voight (1985), Galli- 
van, Plemmons, and Samek (1990), and Demmel, Heath, and van der Vorst 
(1993). 


6.1 Basic Concepts 


In this section we introduce the distributed and shared memory paradigms 
using the gaxpy operation 


z=y+Az, AcR'""nyuzcE (6.1.1) 


as an example. In practice, there is a fuzzy line between these two styles 
of paralle] computing and typically a blend of our comments apply to any 
particular machine. 


6.1.1 Distributed Memory Systems 


In a distributed memory multiprocessor each processor has a local mem- 
ory and executes its own node program. The program can alter values in 
the executing processor's local memory and can send data in the form of 
messages to the other processors in the network. The interconnection of 
the processors defines the network topology and one simple example that 
is good enough for our introduction is the ring. See FiGURE 6.1.1. Other 


fro} fro] fom} fom 


FIGURE 6.1.1 A Four-Processor Ring 


important interconnection schemes include the mesh and torus (for their 
close correspondence with two-dimensional arrays), the hypercube (for its 
generality and optimality), and the tree (for its handling of divide and 
conquer procedures). See Ortega and Voigt (1985) for a discussion of the 
possibilities. Our immediate goal is to develop a ring algorithm for (6.1.1). 
Matrix multiplication on a torus is discussed in §6.2. 

Each processor has an identification number. The uth processor is des- 
ignated by Proc(u). We say that Proc(A) is a neighbor of Proc(j) if there 
is a direct physical connection between them. Thus, in a »processor ring, 
Proc(p — 1) and Proc(1) are neighbors of Proc(p). 


6.1. Basic CONCEPTS ITT 


Important factors in the design of an effective distributed memory al- 
gorithm include (a) the number of processors and the capacity of the local 
memories, (b) how the processors are interconnected, (c) the speed of com- 
putation relative to the speed of interprocessor communication, and (d) 
whether or not a node is able to compute and communicate at the same 
time. 


6.1.2 Communication 


To describe the sending and receiving of messages we adopt a simple nota- 
tion: 


send( {matriz} , (id of the receiving processor} ) 
recv( (matriz) , (id of the sending processor? ) 


Scalars and vectors are matrices and therefore messages. In our model, 
if Proc(u) executes the instruction send(Vi,., A), then a copy of the local 
matrix Vioc is sent to Proc(A) and the execution of Proc(u)'s node program 
resumes immediately. It is legal for a processor to send a message to itself. 
To emphasize that a matrix is stored in a local memory we use the subscript 
loc." 

If Proc(j1) executes the instruction recv(Uj,,, à), then the execution of 
its node program is suspended until a message is received from Proc(A). 
Once received, the message is placed in a local matrix Ur," and Proc(gu) 
resumes execution of its node program. 

Although the syntax and semantics of our send/receive notation is ad- 
equate for our purposes, it does suppress a number of important details: 


e Message assembly overhead. In practice, there may be a penalty 
associated with the transmission of a matrix whose entries are not 
contiguous in the sender's local memory. We ignore this detail. 


+ Message tagging. Messages need not arrive in the order they are sent, 
and a system of message tagging is necessary so that the receiver is 
not "confused." We ignore this detail hy assuming that messages do 
arrive in the order that they are sent. 


Message interpretation overhead. In practice a message is a bit string, 
and a header must be provided that indicates to the receiver the 
dimensions of the matrix and the format of the floating point words 
that are used to represent its entries. Going from message to stored 
matrix takes time, but it is an overhead that we do not try to quantify. 


These simplifications enable us to focus on high-level algorithmic ideas. But 
it should be remembered that the success of a particular implementation 
may hinge upon the control of these hidden overheads. 
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6.1.3 Some Distributed Data Structures 


Before we can specify our first distributed memory algorithm, we must 
consider the matter of data layout. How are the participating matrices and 
vectors distributed around the network? 

Suppose z € K" is to be distributed among the local memories of a p- 
processor network. Assume for the moment that n = rp. Two "canonical" 
approsches to this problem are store-by-row and store-by-column. 

In store-by-column we regard the vector r as an r-by-p matrix, 


Zexp = [| r(Lbr) a(rt+h2r) ©- z(1i4(p-1)rm) l, 


and store each column in a processor, i.e, r(1 + (u —1l)r:ur) € Proc(u). 
(In this context ^c" means "is stored in.”) Note that each processor houses 
a contiguous portion of z. 

In the store-by-row scheme we regard z as a p-by-r matrix 


Tpxr = [ x{1:p) a(p+:2p) --. z((r— l)p- l:n) ] ] 


and store each row in a processor, i.e., x{u:p:n) € Proc(js). Store-by-row is 
sometimes referred to as the wmp method of distributing a vector because 
the components of x can be thought of as cards in a deck that are “dealt” 
to the processors in wrap-around fashion. 

If n is not an exact multiple of p, then these ideas go through with minor 
modification. Consider store-by-column with n = 14 and p = 4: 


T 
Z” = [T1 Z2 Z3 T4 | Zg Ta fr ts | To Tip Tii | Trz 313 34 l. 
Proc(1) Proc(2) Proc(3) Proc(4) 


In general, if n = pr +q with 0 < q < p, then Proc(1),. . .,Proc(g) can 
each house r + 1 components and Proc(g + 1),..., Proc(p) can house r 
components. In store-by-row we simply let Proc(u) house z(u:pn). 

Similar options apply to the layout of a matrix. There are four obvious 
possibilities if A c E'"" and (for simplicity) n = rp: 


These strategies have block analogs. For example, if 4 = [Ai,..., Aw] is 
a block column partitioning, then we could arrange to have Proc(u) store 
Aj for i = p: N. 
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6.1.4  Gaxpy on a Ring 


We are now set to develop a ring algorithm for the gaxpy z = y + Az 
(A € R?*", x,y € R”). For clarity, assume that n = rp where p is the size 
of the ring. Partition the gaxpy aa 


el Uy Au ce Alp Ti 
: : - : : . (6.1.2) 
Up Ip 


<p Yp Ápi c 


where Aj; € R°" and z; yi, z, € IF. We assume that at the start of com- 
putation Proc(u) houses Tu, Yu, and the uth block row of A. Upon com- 
pletion we set as our goal the overwriting of y, by z,. From the Proc(,) 
perspective, the computation of 


p 
Zu = Va + >> Aur 


Tm i 


involves local data (A,r, y,,z,) and nonlocal data (x+, T # p). To make 
the nonlocal portions of z available, we circulate its subvectors around the 
ring. For example, in the p = 3 case we rotate the T1, T2, and z3 as follows: 


j| zs | m | r | 


When a aubvector of r “visits”, the host processor must incorporate the 
appropriate term into its running sum: 


| step |. Proc({1) | Proc2) | Proc(3) | 
| 1 ig =f + Anta | we yr Anz | ys = ys + Aa | 
| 2 iw =n t+ Ayr 
| 3 |y Anzi | ye = ya ArT | ys = Va + Asszs | 


In general, the “merry-go-round” of x subvectors makes p “stopa.” For each 
received r-subvector, a processor performs an r-by-r gaxpy. 


Algorithm 6.1.1 Suppose A c E?*", + c R^, and y € R” are given and 
that z = y + Az. If each processor in a p-processor ring executes the 
following node program and n = rp, then upon completion Proc(u) houses 
z(1--(u—l)r;ur) in yioe. Assume the following local memory initializationa: 
p, p (the node id), left and right (the neighbor id’s), n, row = 1--(u—1)r:ur, 
Ájoc = A(row,:), zig, = z(row), foe = y(row). 
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fort = Lp 

send (ice, right) 

recv(Zioc; left) 

T=p-t 

if; <0 

T=Tt+p 

end 

{ Zive = 2{1 + (r - Deor) } 

Vioc = Moe + Atoe(:s 1+ (7 — D)rirr)zioe 
end 


The index r names the currently available z subvector. Once it is com- 
puted it is poesible to carry out the update of the locally housed portion of 
y. The send-recv pair passes the currently housed z subvector to the right 
and waits to receive the next one from the left. Synchronization i3 achieved 
because the local y update cannot begin until the "new" r subvector ar- 
rives. It is impossible for one processor to “race ahead" of the others or for 
an z subvector to pass another in the merry-go-round. The algorithm is 
tailored to the ring topology in that only nearest neighbor communication 
is involved. The computation is also perfectly load balanced meaning that 
each processor has the same amount of computation and communication. 
Load imbalance is discussed further in $6.1.7. 

The design of a parallel program involves subtleties that do not arise in 
the uniprocessor setting. For example, if we inadvertently reverse the order 
of the send and the recv, then each processor starts its node program by 
waiting for a message from its left neighbor. Since that neighbor in turn is 
waiting for a message from its left neighbor, a state of deadlock results. 


6.1.5 The Cost of Communication 


Communication overheads can be estimated if we model the cost of sending 
and receiving a message. To that end we assume that a send or recv 
involving rn floating point numbers requires 


T(m) = ag + Bam (6.1.3) 


seconds to carry out. Here a4 is the time required to initiate the send or 
recv and 9, ia the reciprocal of the rate that a message can be transferred. 
Note that this mode! does not take into consideration the "distance" be- 
tween the sender and receiver. Clearly, it takes longer to pass & message 
halfway around a ring than to a neighbor. That is why it is always desirable 
to arrange (if possible) a distributed computation so that communication 
is just between neighbors. 

During each step in Algorithm 6.1.1 an r-vector is sent and received and 
2r? flops are performed. If the computation proceeds at R flopa per second 
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and there is no idle waiting associated with the recv, then each yr, update 
requires approximately (2r? / R) + 2(a4 + Gar) seconds. 

Another instructive statistic is the computation-to-communtcation ratio. 
For Algorithm 6.1.1 this is prescribed by 


Time spent computing — 2r?/R 
Time spent communicating ^ 2(aq + far) 


This fraction quantifies the overhead of communication relative to the vol- 
ume of computation. Clearly, as r = n/p grows, the fraction of time spent 
computing increases. 


6.1.6 Efficiency and Speed-Up 
The efficiency of a p-processor parallel algorithm is given by 


TQ) 
~ T 


where T(k) is the time required to execute the program on & processors. 
If computation proceeds at R flops/sec and communication is modeled by 
(6.1.3), then a reasonable estimate of T(k) for Algorithm 6.1.1 is given by 


T(k) >? 2(n/kY?/R + 2(ag + Ba(n/k)) = a + 2agk + 284n 
=1 


for k > 1. This assumes no idle waiting. If k = 1, then no communication 
is required and T(1) = 2n7/R. It follows that the efficiency 


1 
="—pR/. Baa 
1+ PR (aak + 8) 
improves with increasing n and degradates with increasing p or E. In 
practice, benchmarking is the only dependable way to assess efficiency. 
A concept related to efficiency is speed-up. We say that a parallel algo- 
rithm for a particular problem achieves speed-up S if 


S = Taeg [T par 


where Tpar is the time required for execution of the parallel program and 
Teg is the time required by one processor when the best uniprocessor pro- 
cedure is used. For some problems, the fastest sequential algorithm does 
not parallelize and so two distinct algorithms are involved in the speed-up 
assessment. 


1We mention that these simple measures are not particularly illuminating in systems 
where the nodes are able to overlap computation and communication. 
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6.1.7 The Challenge of Load Balancing 


If we apply Algorithm 6.1.1 to a matrix A € K**" that is lower triangular, 
then approximately half of the flops associated with the yo. updates are 
unnecessary because half of the A;; in (6.1.2) are zero. In particular, in the 
uth processor, Ajoc(:,1 + (r — l)r:tr} is zero if r > u. Thus, if we guard 
the yj, update as follows, 


ifr Sp 
Yoe = Vioc + Atoc(:, 1+ (T — L)r:TT)£loc 
end 


then the overall number of flops is halved. This solves the superfluous flops 
problem but it creates a load imbalance problem. Proc(j:) oversees about 
ur? /2 flops, an increasing function of the processor id j. Consider the 
following r = p = 3 example: 


x1 a 0 010 0 0/0 0 OQ Ti Vt 
£2 ao a 0|[0.00|[0 0 0 T} y2 
Z3 a a œj) 0 QOj[O a 0 T3 ya 
Z4 pA Bip gofo oo Z4 Va 
zy | =|8 8 8\8 B 0jO0 0 0 £5 | + [ ws 
26 B B BIB B pio 0 0 rg Ya 
Zr qeu A uy pue vxo dq IT yr 
Za y yY wey ory oy I8 ya 
zo n ww wp uy Ig Vo 


Here, Proc(1) handles the a part, Proc(2) handles the 8 part, and Proc(3) 
handles the + part. 

However, if processors 1, 2, and 3 compute (21, z4, zr), (22, 25, za), and 
(23, Ze, Z9), respectively, then approximate load balancing results: 


Zi a 0 0/0 D 0;0 0 D Ti Yy. 
24 B B Bp o O70 0 O0 Z3 Ya 
ED Y Y T|Y Y yy 0 0 Z3 Wr. 
Z2 a a 0j[0 0 0/j0 od OQ T4 ya 
2 | - | B B Ais 8 OFA 0 Q te | + | ws 
9 3 y y py wy bee DL Te _Ys_ 
za a a aj 0 04,0 0 0 IT y3 
ze B P Alp P goo Q Tg ye 
zg y wp T TITT Ig yo 


The amount of arithmetic still increases with u, but the effect is not no- 
ticeable if n >> p. 

The development of the general algorithm requires some index manip- 
ulation. Assume that Proc(u) is initialized with Aj, = A(j:n,:) and 
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Yoc = y(u:p:n), and assume that the contiguous rz-subvectors circulate as 
before. If at some stage Troe contains z(1 + (7 — I)r:rr), then the update 


Voc = Vtoc + Aioc(, 1 + (r — l)r:rr)zioe 
implementa 
y(u:p;m) = y(upen) + A(uipir, 1 + (r — Dr:rr)z(1 + (7 — D)r:7T). 


To exploit the triangular structure of A in the yj," computation, we express 
the gaxpy as a double loop: 


for a = Lr 
for § = Lr 
Vioc(a) = Ytoe(a) + Atoc(a, B + (T — 1)r)zioc(8) 
end 
end 


The Atoc reference refers to A{jit+(a—1)p, 8+{r—1)r)} which is zero unless 
the column index is less than or equal to the row index. Abbreviating the 
inner loop range with this in mind we obtain 


Algorithm 6.1.2 Suppose A c IR"™”, x € R” and y € R” are given and 
that z = y+ Az. Assume that n = rp and that A is lower triangular. If 
each processor in a p-processor ring executes the following node program, 
then upon completion Proc(1) houses z(u:p:n) in yioe. Assume the following 
local memory initializations: p, jJ (the node id), left and right (the neighbor 
id's), n, Atoe = A(N, :), Yloe = yupin), and zi = z(1 + {p — l)r:r). 


r=n/p 
for t = l:p 
send (ioc, right) 
recvízioc, left) 
TrT=p-—t 
WT «0 
r=T+p 
end 
{Tto = (1 + (r - D)r:rr)) 
for a = Er 
for 8 = lisz+ (a—1)p— (7 -1)r 
Yloo(@) = yiec(&x) + Atoe(a, B + (7 — 1)r)miec (8) 
end 
end 
end 


Having to map indices back and forth between “node space" and “global 
space" is one aspect of distributed matrix computations that requires care 
and (hopefully) compiler assistance. 
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6.1.8 Tradeoffs 


As we did in $1.1, let us develop a column-oriented gaxpy and anticipate 
its performance. With the block column partitioning 


A = [Aj,...,Ap] A, c R", r=n/p 
the gaxpy z = y + Az becomes 


P 
z=y+) Aut, 


pæl 


where z, = z(l--(u — l)r:pr). Assume that Proc(j) contains A, and £,- 
Its contribution to the gaxpy is the product A z, and involves local data. 
However, these products must be summed. We assign this task to Proc(1) 
which we assume contains y. The strategy is thus for each processor to 
compute A „Tu and to send the result to Proc(1). 


Algorithm 6.1.3 Suppose A c R°*", z c R^ and y € R” are given and 
that x = y + Az. If each processor in a p-processor network executes the 
following node program and n = rp, then upon completion Proc(1) houses 
z. Assume the following local memory initializations: p, 4 (the node id), 
fi, Zloc = I(l + (p — l)r:ur), Aie = AG, 1 + (p — 1)r:ur), and (in Proc(1) 
only) ytec = y. 
ifu-l1 
Yloe = Vloc + AtocZive 
for t = 2:p 
recv(wisc, t) 
Vioc = Vioc + Wioc 
end 
else 
Wise = AtoeZioe 
send (wise, 1) 
end 


At first glance this seems to be much less attractive than the row-oriented 
Algorithm 6.1.1. The additional responsibilities of Proc(1) mean that it 
has more arithmetic to perform by a factor of about 


2n'/p + np —., P^ 
2n? /p 2n 


and more messages to process by a factor of about p. This imbalance be- 
comes less critical if n >> p and the communication parameters ag and fa 
factors are small enough. Another possible mitigating factor is that Algo- 
rithm 6.1.3 manipulates length n vectors whereas Algorithm 6.1.1 works 
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with length n/p vectors. If the nodes are capable of vector arithmetic, then 
the longer vectors may raise the level of performance. 

This brief comparison of Algorithms 6.1.1 and 6.1.3 reminds us once 
again that different implementations of the same computation can have 
very different performance characteristics. 


6.1.9 Shared Memory Systems 


We now discuss the gaxpy problem for a shared memory multiprocessor. In 
this environment each processor has access to a common, global memory 
as depicted in Figure 6.1.2. Communication between processors is achieved 


Global Memory 


FIGURE 6.1.2 A Four-Processor Shared Memory System 


by reading and writing to global variables that reside in the global memory. 
Each processor executes its own local program and has its own local memory. 
Data flows to and from the global memory during execution. 


All the concerns that attend distributed memory computation are with 
us in modified form. The overall procedure should be load balanced and the 
computations should be arranged so that the individual processors have 
to wait as little as possible for something useful to compute. The traffic 
between the global and local memories must be managed carefully, because 
the extent of such data transfers is typically a significant overhead. (It 
corresponds to interprocessor communication in the distributed memory 
setting and to data motion up and down a memory hierarchy as discussed 
in 81.4.5.) The nature of the physical connection between the processors 
and the shared memory is very important and can effect algorithmic devel- 
opment. However, for simplicity we regard thia aspect of the system as a 
black box as shown in Figure 6.1.2. 
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6.1.10 A Shared Memory Gaxpy 
Consider the following partitioning of the n-by-n gaxpy problem z = y+ Ar: 


ži 9i A 
cP = et | ty: [dx (6.1.4) 


Zp Yp A, 


Bere we assume that n = rp and that A, c R^, gy, € R^, and z, EN. 
We use the following algorithm to introduce the basic ideas and notations. 


Algorithm 6.1.4 Suppose A c R"*", x c E^, and y € R” reside in a 
global memory accessible to p processors. If n = rp and each processor 
executes the following algorithm, then upon completion, y is overwritten 
by z = y+ Az. Assume the following initializations in each local memory: 
p, p (the node id), and n. 


r=n/p 
row —l-íu-l)r:ur 
Tig = T 
Vioc = y(row) 
for 7 = lin 
Qo: = A(row, j) 
Voc = Yloc + MocTloc(j) 
end 
y(row) = Yioe 


We assume that a copy of this program resides in each processor. Float- 
ing point variables that are local to an individual processor have a “loc” 
subscript. 

Data is transferred to and from the global memory during the execution 
of Algorithm 6.1.4. There are two global memory reads before the loop 
(Zicee = T and Yroc = y(row)), one read each time through the loop (ais. = 
A(row, j)), and one write after the loop (y(row) = yroc). 

Only one processor writes to a given global memory location in y, and 
so there is no need to synchronize the participating processors. Each has 
a completely independent part of the overall gaxpy operation and does not 
have to monitor the progress of the other processors. The computation is 
statically scheduled because the partitioning of work is determined before 
execution. 

if A is lower triangular, then steps have to be taken to preserve the 
load balancing in Algorithm 6.1.4. As we discovered in §6.1.7, the wrap 
mapping is a vehicle for doing this. Assigning Proc{j:) the computation of 
z(u:p.n) = y(u:pim) + A(p:p:n, :)z effectively partitions the n? ops among 
the p processors. 
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6.1.11 Memory Traffic Overhead 


It is important to recognize that overall performance depends strongly on 
the overheads associated with the reads and writes to the global memory. 
If such a data transfer involves m floating point numbers, then we model 
the transfer time by 
rim) = a, t B,m. (6.1.5) 

The parameter a, represents a start-up overhead and £, is the reciprocal 
transfer rate. We modeiled interprocessor communication in the distributed 
environment exactly the same way. (See (6.1.3).) 

Accounting for all the shared memory reads and writes in Algorithm 
6.1.4 we see that each processor spends time 


n^ 


communicating with global memory. 

We organized the computation so that one column of A(row.:) is read 
at a time from shared memory. /f the local memory is large enough, then 
the loop in Algorithm 6.1.4 can be replaced with 

Alo = A(rotw, :) 
Vioc = Woe + ÅlocTloc 


This changes the communication overhead to 
2 
- n 
T% ja, T — Bs, 
P 
& significant improvement if the start-up parameter a, is large. 


6.1.12 Barrier Synchronization 


Let us consider the shared memory version of Algorithm 6.1.4 in which 
the gaxpy is column oriented. Assume n = rp and col = 1 + (u — l)r:ur. 
A reasonable idea is to use a global array W(1:n, 1:p) to house the prod- 
ucts A(:, col)z(col) produced by each processor, and then have some chosen 
processor (say Proc(1)) add its columns: 


Aige = Ali, col); Lise = z(col); Wine = Aioertoe; W (:, A) = Wioc 


if p= 1 
Vloc = Y 
for j = Lp 
Uu = Wij) 
Vioc = Vloc + Wioe 
end 
y = loc 


end 
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However, this strategy is seriously flawed because there is no guarantee that 
W (1:n, 1:p) is fully initialized when Proc(1) begins the summation process. 

What we need is a synchronization construct that can delay the Proc(1) 
summation until all the processors have computed and stored their contri- 
butions in the W array. For this purpose many shared memory systems 
support some veraion of the barrier construct which we introduce in the 


following algorithm: 


Algorithm 6.1.5 Suppose Ac K""", zc RE^, and y € R” reside in a 
global memory accessible to p processors. If n = rp and each processor 
executes the following algorithm, then upon completion y is overwritten by 
y + Az. Assume the following initializations in each local memory: p, ju 
(the node id), and n. 


r = n/p, col = 1+ (p — l)r:ur; Atos = Al:, col); Zioc = z(col) 
Jioc = AlccZloc 
Wt, 2) = Woe 
barrier 
lf p= 1 
Yloc = Y 
for j = l:p 
Wloc = W (:,7) 
Voc = Yloc + Wioc 
end 
y = Woe 
end 


To understand the barrier, it is convenient to regard a processor as either 
blocked or free. A processor is blocked and suspends execution when it 
executes the barrier. After the pth processor is blocked, ail the processors 
return to the “free state” and resume execution. Think of the barrier as 
treacherous stream to be traversed by all p processors. For safety, they 
all congregate on the bank before attempting to cross. When the last 
member of the party arrives, they ford the stream in unison and resume 
their individual treks. 

In Algorithm 6.1.5, the processors are blocked after computing their 
portion o£ the matrix-vector product. We cannot predict the order in which 
these blockings occur, but once the last processor reaches the barrier, they 
are all released and Proc(1) can carry out the vector summation. 


6.1.13 Dynamic Scheduling 


Instead of having one processor in charge of the vector summation, it is 
tempting to have each processor add its contribution directly to the global 
variable y. For Proc(), this means executing the following: 
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r = n/p; col = 1 + (p — 1)r:ur; Atoe = A(:, col); zoe = z(col) 
Woe = AtocTloc 
Voc = V; Moc = Vioc + "toc: Y = Vioc 


However, a problem concerns the read-update-write triplet 


Vloc = Yi Yoc = Mloc t "loci Y = Yoc 


Indeed, if more than one processor is executing this code fragment at the 
same time, then there may be a loss of information. Consider the following 


sequence: 


Proc(1) reads y 
Proc(2) reads y 
Proc(1) writes y 
Proc(2) writes y 


The contribution of Proc(1) is lost because Proc(1) and Proc(2) obtain the 
same version of y. As a result, the effect of the Proc(1) write is erased by 
the Proc(2) write. 

To prevent this kind of thing from happening most shared memory 
systems support the idea of a critical section. These are special, isolated 
portions of à node program that require a "key" to enter. Throughout the 
system, there is only one key and so the net effect is that only one processor 
can be executing in a critical section at any given time. 


Algorithm 6.1.6 Suppose A € R°“", x e€ R”, and y € R" reside in a 
global memory accessible to p processors. If n = pr and each processor 
executes the following algorithm, then upon completion, y is overwritten 
by y + Ar. Assume the following initializations in each local memory: p, u 
(the node id), and -n. 


r= n/p; col = 1 -- (u — Y)r:ur; Atoe = A(: col); zioc = z(col) 
Woe = AjoeZioe 
begin critica] section 
Vioc = ¥ 
Ylor = Vioc + Woe 
y = Vioc 
end critical section 


This use of the critical section concept controls the update of y in a way 
that ensures correctness. The algorithm is dynamically scheduled because 
the order in which the summations occur is determined as the computation 
unfolds. Dynamic scheduling is very important in problems with irregular 
structure. 
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Problems 


P6.1.1 Modify Algorithm 6.1.1 so that it can handle arbitrary n. 

P0.1.2 Modify Algorithm 6.1.2 so that it efficiently handie the upper triangular case. 
P6.1.3 (a) Modify Algorithms 6.1.3 and 6.1.4 so that they overwrite y with x = y+ A"z 
for a given positive integer m that is available to esch processor. (b) Modify Algorithms 
6.1.3 and 6.1.4 so that y in overwritten by z = y + AT Az. 


P6.1.4 Modify Algorithm 6.1.3 so that upon completion, the local array Alo, in Proc(n) 
houses the uth block column of A+ zy”. 


P6.1.5 Modify Algorithm 6.1.4 so that (a) A is overwritten by the outer product update 
A + ry, (b) z is overwritten with A*z, (c) y is overwritten by a unit 2-norm vector in 
the direction of y+ A*r, and (d) it efficiently handles the case when A is lower triangular. 


Notes and Raferences for Sec. 6.1 
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6.2 Matrix Multiplication 


In this section we develop two parallel algorithms for matrix-matrix multi- 
plication. A shared memory implementation is used to illustrate the effect 
of blocking on granularity and load balancing. A torus implementation is 
designed to convey the spirit of two-dimensional data flow. 


6.2.1 A Block Gaxpy Procedure 


Suppose A, B, C € KU*" with B upper triangular and consider the compu- 
tation of the matrix multiply update 


D=C+AB (6.2.1) 


on a shared memory computer with p processors. Assume that n = rkp 
and partition the update 


[Dis Deal = [Cissy Coy | + [Ai AS | [Bi -- -1 Brp ] (6.2.2) 
where each block column has width r = n/(kp). If 


Bi; 
B; = | I |, Bye 
Q 
then , 
D; = C; t AB; = C; + 3 ArByj. (6.2.3) 


Tml 


The number of flops required to compute D; is given by 


2n? | 
f= ati = (Fas) 4 
This is an increasing function of j because B is upper triangular. As we 
discovered in the previous section, the wrap mapping is the way to solve 
load imbalance problems that result from triangular matrix structure. This 
suggests that we assign Proc(j:) the task of computing D; for j = u:p:kp. 


Algorithm 6.2.1 Suppose A, B, and C are n-by-n matrices that reside 
in a global memory accessible to p processors. Assume that B is upper 
triangular and n — rkp. If each processor executes the following algorithm, 
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then upon completion C is overwritten by D = C + AB. Assume the 
following initializations in each local memory: n, r, k, p and p (the node 
id). 


for j = u:p:kp 
(Compute D,.} 
Bioæ = B(1:jr, 1 + (7 — 1)r:jf) 
Cioe = C(,1- (j - 1)rj7) 
for r = 1:3 
col = 1+ (7 — ijrirr 
Alec = A(:, col) 
Choe = Cioe + Atoc Bioc(col, i] 
end 
C(:,1-rF(j — 1)rjr) = Chee 
end 


Let us examine the degree of load balancing as a function of the parameter 
k. For Proc(u), the number of flops required is given by 


k 
k?pY 2n? 
F(u) — 3 futti-up ed (ss T =) Pp 
im} . 


The quotient F(p)/ F(1) is a measure of load balancing from the flop point 
of view. Since 


F(p)  kptk»/2 _ 2 2(p — 1) 

F() k+kp/2 — 2+kp 
we see that arithmetic balance improves with increasing k. A similar anal- 
ysis shows that the communication overheads are well balanced as k in- 
creases. 

On the other hand, the total number of global memory reads and writes 
associated with Algorithm 6.2.1 increases with the square of k. If the start- 
up parameter a, in (6.1.5) is large, then performance can degrade with 
increased k. 

The optimum choice for k given these two opposing forces is system 
dependent. If communication is fast, then smaller tasks can be supported 
without penalty and this makes it easier to achieve load balancing. À mul- 
tiprocessor with this attribute supporta fine-grained parallelism. However, 
if granularity is too fine in a system with high-pertormance nodes, then it 
may be impossible for the node programs to perform at levei-2 or level-3 
speeds simply because there just is not enough local linear algebra. Again, 
benchmarking is the only way to clarify these issues. 


6.2.2 Torus 


A torus is a two-dimensional processor array in which each row and col- 
umn is a ring. See FIGURE 6.2.1. A Processor id in this context is an 
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ordered pair and each processor haa four neighbors. In the displayed exam- 


FIGURE 6.2.1 A Four-by-Four Torus 


ple, Proc(1,3) has west neighbor Proc(1,2), east neighbor Proc(1,4), south 
neighbor Proc(2,3), and north neighbor Proc(4,3). 

To show what it is like to organize a toroidal matrix computation, we 
develop an algorithm for the matrix multiplication D = C + AB where 
A,B,C ém"*". Assume that the torus is p;-by-p1 and that n = rp. 
Regard A = (Ai), B = (Bi), and C = (C) as pi-by-p1 block matrices 
with r-by-r blocks. Assume that Proc(i, j) contains Aij, Bij, and C,; and 
that its mission is to overwrite Ci; with 


m 
Dj = C4 + Y Ae Bry. 
kæl 


We develop the general algorithm from the p; = 3 case, displaying the torus 
in cellular form as follows: 
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Let us focus attention on Proc(1,1) and the calculation of 
Dij = Cy + Aj Bur + ArBi + ABa. 


Suppose the six inputs that define this block dot product are positioned 
within the torus as follows: 


(Pay no attention to the “dots.” They are later replaced by various Aj; 
and B). 

Our plan is to "ratchet" the first block row of A and the first block 
column of B through Proc(1,1) in a coordinated fashion. The pairs Àj; 
and Bii, Ajg and Ba, end A,3 and Ba; meet, are multiplied, and added 
into a running sum array Cioc: 


Cioe = Croc + Áit282 


Cioc = Cloe + A13B31 


Cloe = Cloe + Arr Bir 
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Thus, after three steps, the local array Cioe in Proc(1,1) houses D31. 

We have organized the flow of data so that the A1; migrate westwards 
and the Bj; migrate northwards through the torus. It is thus apparent that 
Proc(1,1) must execute a node program of the form: 


for t = 1:3 
send (Ajo, west) 
send( Bis, north) 
recv( Aio, east) 
recv( Bis, south) 
Cioe = Croe xx Atoe Bloc 
end 


The send-recv-send-recv sequence 


for t = 1:3 
send (Ase, west) 
recv( Aic, east) 
send{ Bios, north) 
recv( Bio, south) 
Cio = Cloc + Aioc Bio 
end 


also works. However, this induces unnecessary delays into the process be- 
cause the B submatrix is not sent until the new A submatrix arrives. 

We next consider the activity in Proc(1,2), Proc(1,3), Proc(2,1), and 
Proc(3,1). At this point in the development, these processors merely heip 
circulate blocks A11, A19, and A13 and By), By, and Ba, respectively. If 
Baz, Biz, and Bzz flowed through Proc(1,2) during these steps, then 


Diz = C33 + Aig Bag + Aj Biz + ABr 
could be formed. Likewise, Proc(1,3) could compute 
Dia = C33 + Ay Big + Aig Bea + Ais Bas 


if By3, Bag, and Bs3 are available during t = 1:3. To this end we initialize 
the torus as follows 


With northward flow of the B,; we get 
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t=1 
t=2 
t=3 


Thus, if B is mapped onto the torus in a “staggered start” fashion, we can 
arrange for the first row of processors to compute the first row of C. 

If we stagger the second and third rows of A ìn a similar fashion, then 
we can arrange for all nine processors to perform a multiply-add at each 
step. In particular, if we set 


then with westward flow of the A,; and northward flow of the Bi; we obtain 
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From this example we are ready to specify the general algorithm. We 
assume that at the start, Proc(i, j) houses Aj;, Bij, and C,;. To obtain the 
necessary staggering of the A data, we note that in processor row i the Á; 
should be circulated westward i — 1 positions. Likewise, in the jth column 
of processors, the B;; should be circulated northward j — 1 positions. This 
gives the following algorithm: 


Algorithm 6.2.2 Suppose A c K""", B e R"*^, and C € R?*" are given 
and that D = C + AB. If each processor in a pi-by-p, torus executes 
the following algorithm and n = pır, then upon completion Proc(y, A) 
houses D a in local variable Cioe. Assume the following local memory 
initializations: pi, (1, A) (the node id), north, east, south, and west, (the 
four neighbor ids), row = 1 + (p — l)r:ur, col = 1 + (A — 1)rtAr, Atoc = 
A(row, col), Bios = B(row, col), and Cioe = C(row, col). 


{Stagger the Ap; and Bj. } 
for k= 15-1 
send(Atoc, west); recv( Aoc, east) 
end 
for k=1:A-1 
send( Bigg, north); recv( Bis, south) 
end 
for k = lm 
Cioc = Cloc + Atoc Bloe 
send( As, west) 
send(Bi,., north) 
recy (Ato, east) 
recv( Bios, south) 
end 
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{Unstagger the 4,; and Bà.) 


for k = 14-1 

send (Ajoc, east); recy(Aj,., west) 
end 
for k=1:À -1 

send{ Broc, south); recv( Bios, north) 
end 


It is not hard to show that the computation-to-communication ratio for 
this algorithm goes to zero as n/p, increases. 


Problems 


P6.2.1 Develop a ring implementation for Algorithm 6.2.1. 


P8.2.2 An upper triangular matrix can be overwritten with its square without any 
additional workspace. Write a dynamically scheduled, shared-memory procedure for 
doing this. 
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6.3 Factorizations 


In this section we present a pair of parallel Cholesky factorizations. To 
illustrate what a distributed memory factorization looks like, we implement 
the gaxpy Cholesky algorithm on a ring. A shared memory implementation 
of outer product Cholesky is also detailed. 


6.3.1 A Ring Cholesky 


Let us see how the Cholesky factorization procedure can be distributed on 
a ring of p processors. The starting point is the equation 


B-1 
G(u,u)G(um,u) = Alun, u) - Y Glu, 3)G(un,j) = vlen). 


j=l 


This equation is obtained by equating the uth column in the n-by-n equa- 
tion A = GGT. Once the vector v(u:n) is found then G(j:n, p) is a simple 


scaling: 
G(un, p) = v(pin)/ vig). 


For clarity, we first assume that n = p and that Proc(ju) initially houses 
A(p:n, u). Upon completion, each processor overwrites its A-column with 
the corresponding G-column. For Proc() this process involves u — 1 saxpy 
updates of the form 


A(jen, p) — Ayan, i) — GC, J)G(u:m, j) 


followed by a aquare root and a scaling. The general structure of Proc(}:)’s 
node program is therefore as follows: 


for j =l: —1 
Receive a G-column from the left neighbor. 
If necessary, send a copy of the received G-column to 
the right neighbor. 
Update A(u:n, m) . 
end 
Generate G(j:n, u) and, if necessary, send it to the 
right neighbor. 


Thus Proc(1) immediately computes G(1:n,1) = A(l:;n, 1)/,/ A(1, 1) and 
sends it to Proc(2). As soon as Proc(2) receives this column it can generate 
G(2:n, 2) and pass it to Proc(3) etc.. With this pipelining arrangement we 
can assert that once a processor computes its G-column, it can quit. It 
also follows that each processor receives G-columna in ascending order, i.e., 
G(1:n, 1), G(2:n, 2), etc. Based on these observations we have 
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(21 
sided SA 
recv(gioc(j:n), teft) 
ifucn 
send (foc (J:7), right) 
end 
Aioc(u:n) = Alo Wn) — giocGi)g Gn) 
72-71 
end 
Aroc(uin) = Ajoc(un)/ v Atoc(H) 
ifucn 
send( Ajoc(uin), right) 
end 


Note that the number of received G-columns is given by j — 1. If 7 = j, 
then it is time for Proc(s:} to generate and send G(j::n, js). 

We now extend this strategy to the general n case. There are two obvi- 
ous ways to distribute the computation. We could require each processor 
to compute a contiguous set of G-columns. For example, if n = 11, p = 3, 
and A = [a4,...,a11 |, then we could distribute A as follows 


[2182 23 a4 | a5 a6 G7 ag | G9 219 11] 
Proc(1) Proc(2) Proc(3) 


Each processor could then proceed to find the corresponding G columns. 
The trouble with this approach is that (for example) Proc(1) is idle after 
the fourth column of G is found even though much work remains. 

Greater load balancing results if we distribute the computational tasks 
using the wrap mapping, i.e., 


[ a1 04 07 aio | 02 Gs ag 411 | a3 26 as | . 
Proc(1) Proc(2) Proe(3) 


In this scheme Proc({) carries out the construction of G(:, u:p:n). When 
a given processor finishes computing its G-columns, each of the other pro- 
cessors has at most one more G column to find. Thus if n/p > 1, then all 
of the processors are busy most of the time. 

Let us examine the details of a wrap-distributed Cholesky procedure. 
Each processor maintains a pair of counters. The counter 7 is the in- 
dex of the next G-column to be received by Proc(u). A processor also 
needs to know the index of the next G-column that it is to produce. Note 
that if col = gn, then Proc(u) is responsible for G(:,col) and that 
L = length(col) is the number of the G-columns that it must compute. 
We use q to indicate the status of G-column production. Át any instant, 
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col(q) is the index of the next G-column to be produced. 


Algorithm 6.3.1 Suppose A c R°™" is symmetric and positive defi- 
nite and that A = GGT is its Cholesky factorization. If each node in 
a p-processor ring executes the following program, then upon completion 
Proc(s) houses G(k:n, k) for k = u:p:n in a local array Ajo (1:n, L) where 
L = length(co) and col = rpm. In particular, G(col(q):n, col(q)) is 
housed in Ajo c(col(q):n, q) for q = 1:L. Assume the following local memory 
initializations: p, u (the node id), teft and right (the neighbor id's), n, and 
Atoc = Alu, :). 


j= 1; q= l; cA -gq:pn; L = length(col) 
while g < L 
if 7 = col(g) 
{ Form G(jin,7) } 
Atoc(j:, q) = Atc (3n, 4) / V Aus, 9) 
ifj<n 
send( Ajo (J:n, q), right) 
end 
j=3+1 
{ Update local cohunns. } 
for Kk - qr iL 
r — col(k) 
Ajoe(rin, k) = Ajoe(rin, k) — Atoc(r, g) Arc(r:n, 9) 
end 
g=q+l 


recv(gioc(:n), left) 
Compute a, the id of the processor that generated the 
received G-column. 
Compute 6, the index of Proc(right)'s final column. 
ifright fa Aj<8 
send(gioc(j:n), right) 


else 


end 
{ Update local columns. } 
for k = ¢:L 
r = col(k) 
A.loc(rin, k} = Ajoe(rit, k) — qcir)guc(r:n) 
end 
j=j+l 
end 
end 


To illustrate the logic of the pointer system we consider a sample 3-processor 
situation with n = 10. Assume that the three local values of q are 3,2, and 
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2 and that the corresponding values of col(g) are 7, 5, and 6 : 


| l l 
0194 G7 à ds dg a a 
(ai aa dr aio | az ds ae aj | az de œ | 
Proc(1) Proc(2) Proc(3) 


Proc(2) now generates the fifth G-column and increment its g to 3. 
The decision to pass a received G-column to the right neighbor needs 
to be explained. Two conditions must be fulfilled: 


e The right neighbor must not be the processor which generated the G 
column. This way the circulation of the received G-column is properly 
terminated. 


e The right neighbor must stil! have more G-columns to generate. Oth- 
erwise, a G-column will be sent to an inactive processor. 
This kind of reasoning is quite typical in distributed memory matrix com- 
putations. 
Let us examine the behavior of Algorithm 6.3.1 under the assumption 
that n >> p. It is not hard to show that Proc(u) performs 


L 3 
n 
F(u) = V 2(n — (p+ (k - 1)p)(u-- (k - Dp) = — 
kml 3p 
flops. Each processor receives and sends just about every G-column. Us- 
ing our communication overhead model (6.1.3), we see that the time each 
processor spends communicating is given by 


7 
m, = S Haa + Ba(n — j)) = Zaan + Ban. 
j=l 
If we assume that computation proceeds at A flops per second, then the 
computation/communication ratio for Algorithm 6.3.1 is approximately 
given by (n/p)(1/3R4). Thus, communication overheads diminish in im- 
portance as n/p grows. 


6.3.2 A Shared Memory Cholesky 


Next we consider a shared memory implementation of the outer product 
Cholesky algorithm: 
for k = 1:n 
A(k:n, k) = Alkin, k)/ y A(k, k) 
for j=k+ i:n 
A(j:n, j) = ACjn, j} — Ajin, KJ AG, k) 
end 
end 


304 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS 


The j-loop oversees an outer product update. The n — k saxpy operations 
that make up its body are independent and essily parallelized. The scaling 
A(k:n, k) can be carried out by a single processor with no threat to load 
balancing. 


Algorithm 6.3.2 Suppose A c I" "" is a symmetric positive definite 
matrix stored in a shared memory accessible to p processors. If each pro- 
cessor executes the following algorithm, then upon completion the lower 
triangular part of A is overwritten with its Cholesky factor. Assume the 
following initializations. in each local memory: n, p and u (the node id). 
for k = 1:n 
ifu-1 
Uoc( k:n) = A(k:n) 
Uloc(k:n) = tiec (Kir) / V Vioc (k) 
A(k:n, k} = vi (k:n) 
end 
barrier 
Vloc(K + 1:1) = A(k + lin, k) 
for j = (k + u):pin 
Uioc(7:n) = AG:n, j) 
Wioe( 7:2) = wigc(3:n) — viec) viec (7:n) 
Á(j:n, j) = wiec(J:n) 
end 
barrier 
end 
The scaling before the j-loop represents very little work compared to the 
outer product update and so it is reasonable to assign that portion of the 
computation to a single processor. Notice that two barrier statements are 
required. The first ensures that a processor does not begin working on the 
kth outer product update until the kth column of G is made available by 
Proc(1). The second barrier prevents the processing of the k+1st step to 
begin until the kth step is completely finished. 


Problema 


P6.3.1 It is possible to formulate a block version of Algorithm 6.3.1. Suppose n = rN. 
For k = 1:N we (a) have Proc(1) generate G(;, 1 - (k - 1)r:kr) and (b) have all processors 
participate in the rank r update of the trailing submatrix A(kr+ l:n, kr 4-1:n). See 84.2.6. 
The coarser granularity may improve performance if the individual processors like level-3 
operations. 

P8.3.2 Develop a shared memory QR factorization patterned after Algorithm 6.3.2. 
Proc(1) should generate the Householder vectors and all processors should share in the 
ensuing Househoider update. 
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Chapter 7 


The Unsymmetric 
Eigenvalue Problem 


§7.1 Properties and Decompositions 

§7.2 Perturbation Theory 

97.3 Power Iterations 

$7.4 The Hessenberg and Real Schur Forms 
97.5 The Practical QR Algorithm 

$7.6 Invariant Subspace Computations 

$7.7 The QZ Method for Ax = ABx 


Having discussed linear equations and least squares, we now direct our 
attention to the third major problem area in matrix computations, the 
algebraic eigenvalue problem. The unsymmetric problem is considered in 
this chapter and the more agreeable symmetric case in the next. 

Our first task is to present the decompositions of Schur and Jordan 
along with the basic properties of eigenvalues and invariant subspaces. The 
contrasting behavior of these two decompositions sets the stage for §7.2 
in which we investigate how the eigenvalues and invariant subspaces of 
a matrix are affected by perturbation. Condition numbers are developed 
that permit estimation of the errors that can be expected to arise because 
of roundoff. 

The key algorithm of the chapter is the justly famous QR algorithm. 
This procedure is the most complex algorithm presented in this book and its 
development is spread over three sections. We derive the basic QR iteration 
in §7.3 as a natural generalization of the simple power method. The next 
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two sections are devoted to making this basic iteration computationally 
feasible. This involves the introduction of the Hessenberg decomposition in 
§7.4 and the notion of origin shifts in §7.5. 


The QR algorithm computes the real Schur form of a matrix, a canonical 
form that displays eigenvalues but not eigenvectors. Consequently, addi- 
tional computations usually must be performed if information regarding 
invariant subspaces is desired. In §7.6, which could be subtitled, “What to 
Do after the Real Schur Form is Calculated,” we discuss various invariant 
subspace calculations that can follow the QR algorithm. 


Finally, in the last section we consider the generalized eigenvalue prob- 
lem Ar = ABz and a variant of the QR algorithm that has been devised to 
solve it. This algorithm, called the QZ algorithm, underscores the impor- 
tance of orthogonal matrices in the eigenproblem, a central theme of the 
chapter. 


It is appropriate at this time to make a remark about complex versus real 
arithmetic. In this book, we focus on the development of real arithmetic 
algorithms for real matrix problems. This chapter is no exception even 
though a real unsymmetric matrix can have complex eigenvalues. However, 
in the derivation of the practical, real arithmetic QR algorithm and in the 
mathematical analysis of the eigenproblem itself, it is convenient to work 
in the complex field. Thus, the reader will find that we have switched to 
complex notation in §7.1, §7.2, and §7.3. In these sections, we use complex 
versions of the QR factorization, the singular value decomposition, and the 
CS decomposition. 


Before You Begin 


Chapters 1-3 and §§5.1-5.2 are assumed. Within this chapter there are 
the following dependencies: 


87.1 — 872 — §73 — 874 — 87.5 — 876 — 87.7 


Complementary references include Fox (1964), Wilkinson (1965), Gourlay 
and Watson (1973), Stewart (1973), Hager (1988), Ciarlet (1989), Stewart 
and Sun (1990), Watkins (1991), Saad (1992), Jennings and Mc Keowen 
(1992), Datta (1995), Trefethen and Bau (1997), and Demmel (1996). Some 
Matlab functions important to this chapter are eig, poly, polyeig, hess, 
qz, rsf2csf, cdf2rdf, schur, and balance. LAPACK connections include 
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LAPACK: Unsymmetric Eigenproblem 


- GEBAL Balance transform 
- GEBAK Undo balance transform 


Hessenberg reduction UP AV = H 

U (factored form) times matrix (real case) 
Generates U (real case) 

U (factored form) times matrix (complex case) 
Generates U (complex case) 


Schur decomp of general matrix with e.value ordering 

Same but with condition estimates 

Eigenvalues and left and right eigenvectors of general matrix 

Same but with condition estimates 

Selected eigenvectors of upper quasitriangular matrix 

Cond. estimates of selected eigenvalues of upper quasitriangular matrix 
Unitary reordering of Schur decomposition 

Same but with condition estimates 

Solves AX + XB — C for upper quasitriangular A and B 


Balance transform 

Reduction to Hessenberg-Triangular form 
Generalized Schur decomposition 
Eigenvectors 

Undo balance transform 


7.1 Properties and Decompositions 


In this section we survey the mathematical background necessary to develop 
and analyze the eigenvalue algorithms that follow. 


7.1.1 Eigenvalues and Invariant Subspaces 


The eigenvalues of a matrix A € C"*” are the n roots of its characteristic 
polynomial p(z) — det(zI — A). The set of these roots is called the spectrum 
and is denoted by A(A). If A(A) = {A1,...,An}, then it follows that 


det(A) = Mg Aa. 


Moreover, if we define the trace of A by 


n 
tr( A) = 5 Qi, 
i=1 


then tr(À) = A; +---+A,. This follows by looking at the coefficient of 
2^-! in the characteristic polynomial. 
If À € A(A), then the nonzero vectors x € C” that satisfy 


Ar = ÀT 
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are referred to as eigenvectors. More precisely, z is a right etgenvector for A 
if Ar = Az and a left eigenvector if zF A = AzP. Unless otherwise stated, 
“eigenvector” means “right eigenvector.” 

An eigenvector defines a one-dimensional subspace that is invariant with 
respect to premultiplication by A. More generally, a subspace S C C" with 
the property that 

reS—»4ÀresS 


is said to be invariant (for A). Note that if 
AX = XB, Bea™* Kec rt. 


then ran(X) is invariant and By = Ay > A(Xy) = A(X y). Thus, if X has 
full column rank, then AX = X B implies that A(B) € A(A). If X is square 
and nonsingular, then A(A) = A(B) and we say that A and B = X^! AX 
are similar. In this context, X is called a similarity transformation. 


7.1.2 Decoupling 


Many eigenvalue computations involve breaking the given problem down 
into a collection of smaller eigenproblems. The following result is the basis 
for these reductions. 


Lemma 7.1.1 If T c C"*^ is partitioned as follows, 


Tu Tig | P 
T = 
| 0 al q 


p q 
then A(T) = A(TA) U A(T 22). 


Proof. Suppose 
o |TT The |]. | _ Zi 
Ts AE m T2 


where xı € €? and z2 € (7. If r9 Æ 0, then T2212 = Ar; and so À € 
MTe2). If z2 = 0, then Tizi = Az, and so À € A(T11). It follows that 
MT) C A(T31) UA(T23). But since both A(T) and A(T31) U À(T22) have the 
same cardinality, the two sets are equal. O 


7.1.3 The Basic Unitary Decompositions 


By using similarity transformations, it is possible to reduce a given matrix 
to any one of several canonical forms. The canonical forms differ in how 
they display the eigenvalues and in the kind of invariant subspace informa- 
tion that they provide. Because of their numerical stability we begin by 
discussing the reductions that can be achieved with unitary similarity. 
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Lemma 7.1.2 Jf Ac C"*", B c €?*?, and X c CP satisfy 
AX — XB, rank(X) — p, (7.1.1) 


then there exists a unitary Q € C"*" such that 


0 Ta? n—p (7.1.2) 
p n-p 


OF AO eT e kj a p 


Proof, Let 
is Ry xn pxp 
x-QUe QcC""" Rec 


be a QR factorization of X. By substituting this into (7.1.1) and rearrang- 
ing we have 
Es EIS = EIE 
1531 Tz 0 0 


QE AQ = E 2 p 


where 


Th, 15:3 | n-p 
p n-p 
By using the nonsingularity of R, and the equations 73; Ry = 0 and Tj R1 = 
R,B, we can conclude that 72; = 0 and A(T31) = A(B). The conclusion 
now follows because from Lemma 7.1.1 A(A) = A(T) = A(T) U A(153). 


Example 7.1.1 If 


A= —20.40 95.88  —87.16 


22.80 67.84 12.12 
X = [20, —9, —12]T and B = [25], then AX = X B. Moreover, if the orthogonal matrix 


Q is defined hy 
—.800 .360 480 
Q = .960 928  —.096 |, 


.480  —.096 872 


| 67.00 177.60 cms | 


then Q7 X = [-25, 0, OJT and 


25 —90 5 
QTAQ =T= O 147 -104 |. 
O 146 3 


A calculation shows that A(.A) = (25, 75 + 100i, 75 — 100i). 


Lemma 7.1.2 says that a matrix can be reduced to block triangular form 
using unitary similarity transformations if we know one of its invariant 
subspaces. By induction we can readily establish the decomposition of 
Schur (1909). 
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Theorem 7.1.3 (Schur Decomposition) Jf A € C"*", then there exists 
a unitary Q € ("*" such that 


QUAQ =T = D+N (7.1.3) 


where D = diag(A,,...,An) and N € €"™” is strictly upper triangular. 
Furthermore, Q can be chosen so that the eigenvalues A; appear in any 
order along the diagonal. 


Proof. The theorem obviously holds when n — 1. Suppose it holds for all 
matrices of order n — 1 or less. If Ar = Az, where x # 0, then by Lemma 
7.1.2 (with B = (A)) there exists a unitary U such that: 

A wH 1 
0 C n-—1 
171-1 


UH AU = | 


By induction there is a unitary U such that ỌĦCÜ is upper triangular. 
Thus, if Q = Udiag(1,U), then QF AQ is upper triangular. O 


Example 7.1.2 If 


3 8 _ [ .8944 4472 
des | -2 3 | and Q = | 472 go4di | 
then Q is unitary and 
H 34i -6 
qa- | 0 3-4 | 
If Q = [q1,--.,qn | is a column partitioning of the unitary matrix Q in 


(7.1.3), then the q; are referred to as Schur vectors. By equating columns 
in the equations AQ = QT we see that the Schur vectors satisfy 


k-1 
Áqk = Ak Gk + N kd k=1:n. (7.1.4) 


i=1 


From this we conclude that the subspaces 


S, = spanígi...,q4] K=lin 


are invariant. Moreover, it is not hard to show that if Qk = | q1,..-, ak |, 
then A(QP AQ) = (A1,..., Ax]. Since the eigenvalues in (7.1.3) can be ar- 
bitrarily ordered, it follows that there is at least one k-dimensional invariant 
subspace associated with each subset of k eigenvalues. 

Another conclusion to be drawn from (7.1.4) is that the Schur vector qx 
is an eigenvector if and only if the k-th column of N is zero. This turns out 
to be the case for k = 1:n whenever AF A = AA”. Matrices that satisfy 
this property are called normal. 
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Corollary 7.1.4 A €C"*" is normal if and only if there erists a unitary 
Q € C"*? such that Q7 AQ = diag(Ay,..., An). 


Proof. It is easy to show that if A is unitarily similar to a diagonal matrix, 
then A is normal. On the other hand, if A is normal and Q¥ AQ = T is 
its Schur decomposition, then T' is also normal. The corollary follows by 
showing that a normal, upper triangular matrix is diagonal. O 


Note that if Q¥ AQ = T = diag(\,;) + N is a Schur decomposition of a 
general n-by-n matrix A, then || N ||, is independent of the choice of Q: 


INIE = lAllz — SOI? = A?(A). 
1—1 


This quantity is referred to as A's departure from normality. Thus, to 
make T' “more diagonal," it is necessary to rely on nonunitary similarity 
transformations. 


7.1.4  Nonunitary Reductions 


To see what is involved in nonunitary similarity reduction, we examine the 
block diagonalization of a 2-by-2 block triangular matrix. 


Lemma 7.1.5 Let T € €"*" be partitioned as follows: 


Th Tie | P 
T 
| 0 Tre | q 
p q 
Define the linear transformation ¢:€?*% — (P* by 


é(X) = Ti X — XT», 


where X € C?*?, Then ¢ is nonsingular if and only if (T1) A A(1T53) = 9. 
If ó is nonsingular and- Y is defined by 


ln Z 
robe pj «neum 


then Y !TY = diag(T11, T22). 


Proof. Suppose ó( X) = 0 for X #0 and that 


D 0 T 
H EN r 
v"xv = |i am 
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is the SVD of X with X. = diag(o;), r = rank(X). Substituting this into 
the equation T311X = XT. gives 


Alt A12 X. 0 - De 0 Bu Biz 
Agi A22 0 0 B 0 0 Boy B22 
where U?7,,U = (Aij) and V"T22V = (B,;). By comparing blocks we see 
that A21 = 0, Bı2 = 0, and A(Aq1) = À( B11). Consequently, 
9 # (Aji) = (Bi) € (Fir) A A(T 22). 


On the other hand, if A € A(T31) A A(T23) then we have nonzero vectors z 
and y so Tiiz = Ar and yÉ T55 = Ay. A calculation shows that ¢(ry") 
= 0. Finally, if ¢ is nonsingular then the matrix Z above exists and 


I -Z Ti Ti I Z 
0 I 0 The 0 I 
- Ti, TuZ- ZT + Tio = Tu 0 
0 T2» 0 The | 


Y^'TY 


(| 


Example 7.1.3 If 
1 2 3 10 0.5 -0.5 
T = 0 3 8 and Y = 0.0 1.0 0.0 
0 -2 3 


then 


By repeatedly applying Lemma 7.1.5, we can establish the following more 
general result: 


Theorem 7.1.6 (Block Diagonal Decomposition) Suppose 


Ti The -e Tig 

" 0 To ++: Ta 
Q AQ =T= l b. dur (7.1.5) 

D. di» age ODE 


is a Schur decomposition of A € C"*" and assume that the T;; are square. 
If MT) OM T;;) = 0 whenever i + j, then there exists a nonsingular matriz 
Y €€"*" such that 


(QY)-!A(QY) = diag(Ti,--., Tq). (7.1.6) 
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Proof. A proof can be obtained by using Lemma 7.1.5 and induction. 


If each diagonal block 7;; is associated with a distinct eigenvalue, then we 
obtain 


Corollary 7.1.7 If A € €"*" then there exists a nonsingular X such that 
X-lAX = diagQiJ + N,...,AJI - N) | N;eCUm (7.1.7) 


where Ài,..., Aq are distinct, the integers ni, ..., ną satisfy ni ^ t n4 = 
n, and each N; is strictly upper triangular. 


A number of important terms are connected with decomposition (7.1.7). 
The integer n; is referred to as the algebraic multiplicity of ài. If n; = 1, 
then à; is said to be simple . The geometric multiplicity of A; equals the 
dimensions of null( N;), i.e., the number of linearly independent eigenvectors 
associated with A;. If the algebraic multiplicity of A; exceeds its geometric 
multiplicity, then A; is said to be a defective eigenvalue. A matrix with 
a defective eigenvalue is referred to as a defective matrir. Nondefective 
matrices are also said to be diagonalizable in light of the following result: 


Corollary 7.1.8 (Diagonal Form) A € €"*" is nondefective if and only 
if there exists a nonsingular X € ©"”” such that 


X^!AX = disg(A,,..., Àn). (7.1.8) 


Proof, A is nondefective if and only if there exist independent vectors 
qi... Z4 € C^ and scalars A,,..., Àn such that Az; = 42; for à = 1:n. This 
is equivalent to the existence of a nonsingular X = (z,,...,z24] c €^" 
such that AX = X D where D = diag(A,,..., An). Ul 


Note that if yf is the ith row of X ^! , then yf A= AjyF. Thus, the columns 
of X-T are left eigenvectors and the columns of X are right eigenvectors. 


Example 7.1.4 If 


> 5 -—1 ofa! d 
A-| i "d and x «| | 


then X^! AX = diag(4, 7). 
If we partition the matrix X in (7.1.7), 


MS NEM OE 


then €" = ran(X1) 8G... P ran(X,), a direct sum of invariant subspaces. If 
the bases for these subspaces are chosen in a special way, then it is possible 
to introduce even more zeroes into the upper triangular portion of X~!AX. 
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Theorem 7.1.9 (Jordan Decomposition) 7f A € C" ^, then there er- 
ists a nonsingular X € C"*" such that X- AX = diag(J1,..., Ji) where 


io 4 ue od) 
0 A; 

J; = i 
0 0 X 


is m,-by-m, and m,+---+m, =n. 
Proof. See Halmos (1958, pp. 112 ff.) O 


The J; are referred to as Jordan blocks . The number and dimensions of the 
Jordan blocks associated with each distinct eigenvalue is unique, although 
their ordering along the diagonal is not. 


7.1.5 Some Comments on Nonunitary Similarity 


The Jordan block structure of a defective matrix is difficult to determine 
numerically. The set of n-by-n diagonalizable matrices is dense in C" *", 
and thus, small changes in a defective matrix can radically alter its Jordan 
form. We have more to say about this in §7.6.5. 


A related difficulty that arises in the eigenvalue problem is that a nearly 
defective matrix can have a poorly conditioned matrix of eigenvectors. For 
example, any matrix X that diagonalizes 


l+e 1 
A= | 0 Som 0cec«l1 (7.1.9) 


has a 2-norm condition of order 1/e. 


These observations serve to highlight the difficulties associated with ill- 
conditioned similarity transformations. Since 


fX! AX) = X !AX +E, (7.1.10) 


where 


| E |a = u«zCX)l| A ll (7.1.11) 


is it clear that large errors can be introduced into an eigenvalue calculation 
when we depart from unitary similarity. 
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7.1.6 Singular Values and Eigenvalues 


Since the singular values of A and its Schur decomposition QHAQ = 
diag(A;) + N are the same, it follows that 


Omin(A) < min IAil < max \A;| < Omaz(A). 
+ i 


From what we know about the condition of triangular matrices, it may be 
the case that 
3s 
max PI « K2(A). 
ij [Aj 
This is a reminder that for nonnormal matrices, eigenvalues do not have the 
“predictive power” of singular values when it comes to Az = b sensitivity 


matters. Eigenvalues of nonnormal matrices have other shortcomings. See 
§11.3.4. 


Problems 


P7.1.1 Show that if T € ("*” is upper triangular and normal, then T is diagonal. 
P7.1.2 Verify that if X diagonalizes the 2-by-2 matrix in (7.1.9) and « < 1/2 then 
K(X) > 1/e. 

P7.1.3 Suppose A € ("^ has distinct eigenvalues. Show that if QĦ AQ = T is its 
Schur decomposition and AB = BA, then QH BQ is upper triangular. 


P7.1.4 Show that if A and B® are in (”*” with m > n, then: 
MAB) = (BA)U{0,...,0}. 
Ne wan 


mr 


P7.1.5 Given A€ cr use the Schur decomposition to show that for every e > 0, 
there exists a diagonalizable matrix B such that || A — B ||2 € e. This shows that the set 


of diagonalizable matrices is dense in (”*" and that the Jordan canonical form is not 
a continuous matrix decomposition. 


P7.1.6 Suppose A, — A and that QH AQ: = Ty is a Schur decomposition of Ag. 
Show that {Qx} has a converging subsequence (Qx,) with the property that 


hm Qu =Q 


i-200 
where Q” AQ = T is upper triangular. This shows that the eigenvalues of a matrix are 
continuous functions of its entries. 
P7.1.7 Justify (7.1.10) and (7.1.11). 
P7.1.8 . Show how to compute the eigenvalues of 
A C k 
M = 
[2 5] 5 
k j 
where A, B, C, and D are given real diagonal matrices. 
P7.1.9 Use the JCF to show that if all the eigenvalues of a matrix A are strictly less 
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than unity, then limy_.,, A* = 0. 
P7.1.10 The initial value problem 


z(t) = y(t) z(0)—1 
y(t) = -z(t) y(0) = 0 


has solution z(t) = cos(t) and y(t) = sin(t). Let h > 0. Here are three reasonable 
iterations that can be used to compute approximations z ^ r(kh) and y, ~ y(kh) 
assuming that ro = 1 and y, = Q: 


Method i; Zk+ = lthyk 
Ukp] = l-—hz& 

Method 2; Z+ = 1+ Aye 
Ver] = l-—hz&jgi 

Method 3: Zee = 1+ Ayesi 
Veo = l-—hzkjgi 


Express each method in the form 


2] [a] 
Vk+1 Vk 
where Aj, is a 2-by-2 matrix. For each case, compute A( Aj) and use the previous problem 
to discuss lim r, and lim yy as k — oo. 
P7.1.11 If J € R*4 is a Jordan block, what is xæ (J)? 
P7.1.12 Show that if 
R= B P p 


O Rz] q 
p q 
is normal and A{Ry1) N A(R33) = 0, then R33 = 0. 


Notes and References for Sec. 7.1 


The mathematical properties of the algebraic eigenvalue problem are elegantly covered in 
Wilkinson (1965, chapter 1) and Stewart (1973, chapter 6). For those who need further 
review we also recommend 


R. Bellman (1970). Introduction to Matriz Analysis, 2nd ed., McGraw-Hill, New York. 

LC. Gohberg, P. Lancaster, and L. Rodman (1986). Invariant Subspaces of Matrices 
With Applications, John Wiley and Sons, New York. 

M. Marcus and H. Minc (1964). A Survey of Matriz Theory and Matriz Inequalities, 
Allyn and Bacon, Boston. 

L. Mirsky (1963). An Introduction to Linear Algebra , Oxford University Press, Oxford. 


The Schur decomposition originally appeared in 


I. Schur (1908). “On the Characteristic Roots of a Linear Substitution with an Appli- 
cation to the Theory of Integral Equations." Math. Ann. 66, 488-510 (German). 


A proof very similar to ours is given on page 105 of 


H.W. Turnbull and A.C. Aitken (1961). An Introduction to the Theory of Canonical 
Forms, Dover, New York. 
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Connections between singular values, eigenvalues, and pseudoeigenvalues (see §11.3.4) 
are discussed in 


K-C. Toh and L.N. Trefethen (1994). *Pseudozeros of Polynomials and Pseudospectra 
of Companion Matrices," Numer. Math. 68, 403-425. 

F. Kittaneh (1995). "Singular Values of Companion Matrices and Bounds on Zeros of 
Polynomials,” SIAM J. Matriz Anal. Appl. 16, 333—340. 


7.2  Perturbation Theory 


The act of computing eigenvalues is the act of computing zeros of the char- 
acteristic polynomial. Galois theory tells us that such a process has to be 
iterative if n > 4 and so errors will arise because of finite termination. In 
order to develop intelligent stopping criteria we need an informative per- 
turbation theory that tells us how to think about approximate eigenvalues 
and invariant subspaces. 


7.2.1 Eigenvalue Sensitivity 


Several eigenvalue routines produce a sequence of similarity transformations 
Xx with the property that the matrices X, ! AX, are progressively “more 
diagonal.” The question naturally arises, how well do the diagonal elements 
of a matrix approximate its eigenvalues? 


Theorem 7.2.1 (Gershgorin Circle Theorem) Jf X !AX = D+F 
where D = diag(d,,...,d,) and F has zero diagonal entries, then 


AA) € Un. 


i=l 
T 

where D; = (ze €:|z—di| € fl 
j=l 


Proof. Suppose A € A(A) and assume without loss of generality that à Æ d; 
for i = l:n. Since (D — AJ) + F is singular, it follows from Lemma 2.3.3 
that 


1 € [DMIF eo = gy ot 
j=l 


for some k, 1 € k € n. But this implies that à € Dx. O 


It can also be shown that if the Gershgorin disk D; is isolated from the other 
disks, then it contains precisely one of A's eigenvalues. See Wilkinson (1965, 
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pp.71{ff.). 


Example 7.2.1 If 


then A(A) « (10.226, .3870 + 2.22162, .3870 — 2.22161} and the Gershgorin disks are 
Di = { |z| : |z — 10} < 5}, D2 = { fal : |z| <3}, and Ds = { lzl : |z - 1} < 3}. 


For some very important eigenvalue routines it is possible to show that the 
computed eigenvalues are the exact eigenvalues of a matrix A + E where E 
is small in norm. Consequently, we must understand how the eigenvalues 
of a matrix can be affected by small perturbations. A sample result that 
sheds light on this issue is the following. theorem. 


Theorem 7.2.2 (Bauer-Fike) If u is an eigenvalue of A+ E€€"*" 
and X-! AX = D = diag(\i,...,An), then 


min |A—pl S sX) E ||, 
A€A(A) 


where || - ||, denotes any of the p-norms. 


Proof. We need only consider the case when p is not in A( A). If the matrix 
X-!(A- E- uI)X is singular, then so is J +(D— uI) !(X^! EX). Thus, 
from Lemma 2.3.3 we obtain 


1 < || (D-a) (X^ EX) |p € ID -eD lll X lll E lll X i- 


Since (D — pJ)~! is diagonal and the p-norm of a diagonal matrix is the 
absolute value of the largest diagonal entry, it follows that 


D- uI! = min —— 
I (D — ul) |, v PER 


from which the theorem follows. O 


An analogous result can be obtained via the Schur decomposition: 


Theorem 7.2.3 Let QË AQ = D+N be a Schur decomposition of A c (7^*" 
as in (7.1.3). If u € A(A + E) and p is the smallest positive integer such 
that |N|? — 0, then 


min |\—yp| € max(6, 8!/P) 
AEA(A) 


where 


p-l 
0 = | El JIN Ik. 
k=0 
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Proof. Define 


1 
ó= min |[A— = NNT. 
NENA) A - a I| (42 — D)7! Il 


The theorem is clearly true if 6 = 0. If 6 > 0 then J — (uJ — A)! E is 
singular and by Lemma 2.3.3 we have 


1 | (1 - A) Ella < || (HE - A)? llall Ella (7.2.1) 


—— 


| (ur- D) - NY fall Ela. 


Since (I — D)-! is diagonal and |N|? = 0 it is not hard to show that 
((uI — D)-1 N)? = 0. Thus, 


lA 


H 


p-1 
(HI - D) - Ny?! = V (ur - D)1N) (ut - D) 
k=0 
and so T" | ! ; 
-1 1 N lle 
I (ul —D)-N)-" |o < LU ) | 
If 6 > 1 then 


p-1 
Iur -D)- Ny! < = TN 
k=0 


and so from (7.2.1), 6 < 8. If 6 < 1 then 
E NM 
Il - D) -N)!h < = 2! N lp 


and so from (7.2.1), 6° < 0. Thus, 6 € max(0,01/P). O 


Example 7.2.2 If 


1 2 3 0 0 0 
A= 0 4 5 and E = 0 0 Of, 
0 0 4.001 001 0 0 


then A(A + E) œ~ {1.0001, 4.0582, 3.9427} and A's matrix of eigenvectors satisfies 
&2(X) = 107. The Bauer-Fike bound in Theorem 7.2.2 has order 10*, while the Schur 
bound in Theorem 7.2.3 has order 10°. 


‘Theorems 7.2.2 and 7.2.3 each indicate potential eigenvalue sensitivity if A 
is nonnormal. Specifically, if &2(X) or || N ||? ! is large, then small changes 
in A can induce large changes in the eigenvalues. 


Example 7.2.3 If 
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then for all A € A(A) and u € A(A + E), |A — uj = 1071. In this example a change of 
order 1071? in A results in a change of order 107! in its eigenvalues. 


7.2.2 The Condition of a Simple Eigenvalue 


Extreme eigenvalue sensitivity for a matrix A cannot occur if A is normal. 
On the other hand, nonnormality does not necessarily imply eigenvalue sen- 
sitivity. Indeed, a nonnormal matrix can have a mixture of well-conditioned 
and ill-conditioned eigenvalues. For this reason, it is beneficial to refine our 
perturbation theory so that it is applicable to individual eigenvalues and 
not the spectrum as a whole. 

To this end, suppose that à is a simple eigenvalue of A € C"*" and 
that x and y satisfy Ar = Az and y4 A = AyP with || xla = || y lla = 1. 
If YF AX = J is the Jordan decomposition with Y = X~!, then y and 
x are nonzero multiples of X(:,:) and Y(:,1) for some i. It follows from 
1 —Y(;i)F X(:,7) that yPz Æ 0, a fact that we shall use shortly. 

Using classical results from function theory, it can be shown that in a 
neighborhood of the origin there exist differentiable z(e) and A(«) such that 


(A-eF)z(e) = Moz(o) — 1Flo-1 


where A(0) = A and z(0) = z. By differentiating this equation with respect 
to e and setting e = 0 in the result, we obtain 


Az(0)+ Fr = À(0)r + AZ(0). 


Applying y" to both sides of this equation, dividing by yr, and taking 
absolute values gives 


1 
~ ya 


yP Fe 

y" 
The upper bound is attained if F = yz”. For this reason we refer to the 
reciprocal of 


AW = Eg 


s(4) = Iu" z| 
as the condition of the eigenvalue A. 

Roughly speaking, the above analysis shows that if order « perturbations 
are made in A, then an eigenvalue A may be perturbed by an amount 
é/s(A). Thus, if s(A) is small, then A is appropriately regarded as ill- 
conditioned. Note that s(A) is the cosine of the angle between the left and 
right eigenvectors associated with A and is unique only if A is simple. 

A small s(A) implies that A is near a matrix having a multiple eigen- 
value. In particular, if A is distinct and s(A) < 1, then there exists an E 
such that A is a repeated eigenvalue of A+ E and 


Els . — sQ) 


[Ala ~ y1—-s0)9*- 
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This result is proved in Wilkinson (1972). 


Example 7.2.4 If 


then A(A + E) z (1.0001, 4.0582, 3.9427} and s(1) = .8 x 10°, s(4) = .2 x 1073, and 
3(4.001) = .2 x 1075. Observe that || E {l2/s(\) is a good estimate of the perturbation 
that each eigenvalue undergoes. 


7.2.3 Sensitivity of Repeated Eigenvalues 


If A is a repeated eigenvalue, then the eigenvalue sensitivity question is 
more complicated. For example, if 


la 0 0 
BET and Reli a 


then A(A + eF) = {1+ yea}. Note that if a 4 0, then it follows that the 
eigenvalues of A + eF are not differentiable at zero; their rate of change at 
the origin is infinite. In general, if A is a defective eigenvalue of A, then 
O(e) perturbations in A can result in O(e!/?) perturbations in A if A is 
associated with & p-dimensional Jordan block. See Wilkinson (1965, pp. 
77f.) for a more detailed discussion. 


7.2.4 Invariant Subspace Sensitivity 


A collection of sensitive eigenvectors can define an insensitive invariant 
subspace provided the corresponding cluster of eigenvalues is isolated. To 
be precise, suppose 


n-T (7.2.2) 


is a Schur decomposition of A with 


Q = [Qi Q» | (7.2.3) 
r n-r 

It is clear from our discussion of eigenvector perturbation that the sensi- 
tivity of the invariant subspace ran(Q1) depends on the distance between 
A(Ti1) and A(T22). The proper measure of this distance turns out to be 
the smallest singular value of the linear transformation X — T3,.X — X T3. 
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(Recall that this transformation figures in Lemma 7.1.5.) In particular, if 
we define the separation between the matrices T}; and T22 by 


Tu X — XT: 
sep(T11, 752) = Mum Var crede (7.2.4) 
F 


then we have the following general result: 


Theorem 7.2.4 Suppose that (7.2.2) and (7.2.3) hold and that for any 
matriz E € C"*" we partition QË EQ as follows: 


Fi, E T 

H X" 11 12 

Q EQ = B E n—r 
r n—rTm 


If sep(T11, T22) > 0 and 


sl T Ta, T. 
I E ll; ( db Il 12 lla ) « sep( 11 22) 


sep(T11, T22) 5 
then there exists a P e COT" with 

| E21 lla 
sep(T11, 122) 


such that the columns of Q1 — (Qi - Q4 P)(1 + PP P)-1/2 are an orthonor- 
mal basis for a subspace invariant for A+ E. 


| Pl; <4 


Proof. This result is a slight recasting of Theorem 4.11 in Stewart (1973) 
which should be consulted for proof details. See also Stewart and Sun 
(1990, p.230). The matrix (Z + PP P)-!/? is the inverse of the square root 
of the symmetric positive definite matrix J + PH P. See 84.2.10. O 


Corollary 7.2.5 If the assumptions in Theorem 7.2.4 hold, then 


H E21 ll 
sep(Tii , 122) 


Proof. Using the SVD of P, it can be shown that 
| PU + PH PY"? |, < |i Pilly (7.2.8) 


dist(ran(Q1),ran(Qi)) < 4 


The corollary follows because the required distance is the norm of Q7 Qi- 
PU + P# P)". 0 


Thus, the reciprocal of sep(T11, T22) can be thought of as a condition num- 
ber that measures the sensitivity of ran(Q1) as an invariant subspace. 


Example 7.2.5 Suppose 


3 10 0 -—20 I. 1 
Tics n rj: Ta =| 0 rd and na-| | a 
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and that 


2 _| Th The 
Asta] ; n. 


Observe that AQ1 = QiTii where Qi = (r1, e2} € RÉ*?. A calculation shows that 
sep(7T11, T22} æ% .0003. If 


— 14-9 | 1 1 
Ez = 10 EE 


and we examine the Schur decomposition of 


— | Ti The 
A | E2 Tag II 


then we find that Qj gets perturbed to 


—.9999  —.0003 
"NR .0003  —.9999 
Qı = | .0005 -.0026 

.0000  .0003 


Thus, we have dist(ran(Q1), ran(Q1)) = .0027 œ 107 9/sep(T31, T22). 


7.2.5 | Eigenvector Sensitivity 


If we set r = 1 in the preceding subsection, then the analysis addresses the 
issue of eigenvector sensitivity. 


Corollary 7.2.6 Suppose A, E € C^*" and that Q=[qi Q2] E C^*^ is 
unitary with qı € ©”. Assume 


n—1l n-—1l 


A vH 1 e q” 1 
H E H - 
1n-1 1n-1 


(Thus, qı is an eigenvector.) If o = Omin(To2 — AI) > 0 and 
5| * [lo g 
E 1 pes P ee ces 
| ll ( us C Snc 5? 


then there exists p € C°?! with 


Ala 


< 
| p Ile = p 


such that à = (qy--Qa2p)/ V1 + pF p is a unit 2-norm eigenvector for A+ E. 
Moreover, 


dist(span{a,},span{qr}) < 4L. 


Proof. The result follows from Theorem 7.2.4, Corollary 7.2.5 and the 
observation that if T3; = A, then sep(111, 122) = Omin(To2 — AD). O 
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Note that ¢min(T22 — AI) roughly measures the separation of A from the 
eigenvalues of T22. We have to say “roughly” because 


sep(A,722) = Omin(T22 - AI) € min lu — A] 
HE A(T22) 


and the upper bound can be a gross overestimate. 

That the separation of the eigenvalues should have a bearing upon eigen- 
vector sensitivity should come as no surprise. Indeed, if A is a nondefective, 
repeated eigenvalue, then there are an infinite number of possible eigen- 
vector bases for the associated invariant subspace. The preceding analysis 
merely indicates that this indeterminancy begins to be felt as the eigen- 
values coalesce. In other words, the eigenvectors associated with nearby 
eigenvalues are “wobbly.” 


Example 7.2.6 If 


A= | 000 0.99 


then the eigenvalue À = .99 has condition 1/s(.99) ~ 1.118 and associated eigenvector 
z = [.4472, —.8944]T. On the other hand, the eigenvalue A = 1.00 of the "nearby" matrix 


| 1.01 0.01 | 


A+E = Ee 300] 


0.00 1.00 


has an eigenvector $ = [.7071, —.7071]7. 


Problems 


PT.2.1 Suppose QF AQ = diag(A1) + N is a Schur decomposition of A € C" *" and 
define v(A) = || AH A — AA# fi. The upper and lower bounds in 


ni—-n 
12 


WAP cz < 


£I v( A 
6l Allp ^ m 


are established by Henrici (1962) and Eberlein (1965), respectively. Verify these results 
for the case n = 2. 


P7.2.2 Suppose A € C^" and X-1AX = diag(A1,...,An) with distinct Aj. Show 
that if the columns of X have unit 2-norm, then xpg(X)? =n 9 7 (1/s(Ax))? 
PT.2.3 Suppose Q¥ AQ = diag(A;) + N is a Schur decomposition of A and that X ^! AX 
= diag (Ai). Show x2(X)? 2 1- (Il [lp/Il A l| p)?. See Loizou (1969). 
PT.2.4 If X^! AX = diag (A,) and JAi| >--- > [An], then 

ei( A) 

SA) < aX < kaCXOos(A). 
Prove this result for the n — 2 case. See Ruhe (1975). 


P7.2.5 Show that if A = | Ba oe | and a Æ b, then s(a) = s(b) = (14 ]c/(a —5)]?) - 172, 


0 b 
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P7.2.6 Suppose 


[A f 
a=| 4 s 


and that A ¢ A(T23). Show that if o = sep(A, T22), then 


I g 


——— — a —— MÀ" 
Vit || -AD-7'v|i fo? + Holl? 


P7.2.7 Show that the condition of a simple eigenvalue is preserved under unitary 


s(A) = 


similarity transformations. 


P7.2.8 With the same hypothesis as in the Bauer-Fike theorem (Theorem 7.2.2), show 
that min |A—4g| < [|IX^! | FL XI Il. 
AE€A(A) 


PT.2.9 Verify (7.2.5). 


P7.2.10 Show that if B € C™*™ and C e €**", then sep( B, C) is less than or equal 
to |A — y| for all A € A(B) and u € X(C). 


Notes and References for Sec. 7.2 


Many of the results presented in this section may be found in Wilkinson (1965, chapter 
2), Stewart and Sun (1990) as well as in 


F.L. Bauer and C.T. Fike (1960). "Norms and Exclusion Theorems," Numer. Math. 2, 
123-44. 

A.S. Householder (1964). The Theory of Matrices in Numerical Analysis. Blaisdell, 
New York. 


The following papers are concerned with the effect of perturbations on the eigenvalues 
of a general matrix: 


A. Ruhe (1970). “Perturbation Bounds for Means of Eigenvalues and Invariant Sub- 
spaces,” BIT 10, 343-54. 

A. Ruhe (1970). “Properties of a Matrix with a Very lll-Conditioned Eigenproblem,” 
Numer. Math. 15, 57-60. 

J.H. Wilkinson (1972). “Note on Matrices with a Very Ill-Conditioned Eigenproblem,” 
Numer. Math. 19, 176-78. 

W. Kahan, B.N. Parlett, and E. Jiang (1982). “Residual Bounds on Approximate Eigen- 
systems of Nonnormal Matrices,” SIAM J. Numer. Anal. 19, 470-484. 

J.H. Wilkinson (1984). “On Neighboring Matrices with Quadratic Elementary Divisors,” 
Numer. Math. 44, 1-21. 

J.V. Burke and M.L. Overton (1992). “Stable Perturbations of Nonsymmetric Matrices,” 
Lin.Alg. and Its Application 171, 249-213. 


Wilkinson's work on nearest defective matrices is typical of a growing body of literature 
that is concerned with "nearness" problems. See 


N.J. Higham (1985). “Nearness Problems in Numerical Linear Algebra,” PhD Thesis, 
University of Manchester, England. 

C. Van Loan (1985). “How Near is a Stable Matrix to an Unstable Matrix?," Contem- 
porary Mathematics, Vol. 47, 465—477. 

J.W. Demmel (1987). "On the Distance to the Nearest Ill-Posed Problem,” Numer. 
Math. 51, 251-289. 
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J.W. Demmel (1987). “A Counterexample for two Conjectures About Stability,” IEEE 
Trans, Auto. Cont. AC-32, 340-342. 

A. Ruhe (1987). “Closest Normal Matrix Found!,” BIT 27, 585-598. 

R. Byers (1988). “A Bisection Method for Measuring the Distance of a Stable Matrix to 
the Unstable Matrices,” SIAM J. Sei. and Stat. Comp. 9, 875-881. 

J.W. Demmel (1988). “The Probability that a Numerical Analysis Problem is Difficult,” 
Math. Comp. 50, 449-480. 

N.J. Higham (1989). “Matrix Nearness Problems and Applications,” in Applications of 
Matriz Theory, M.J.C. Gover and S. Barnett (eds), Oxford University Press, Oxford 
UK, 1-27. 


Aspects of eigenvalue condition are discussed in 


C. Van Loan (1987). “On Estimating the Condition of Eigenvalues and Eigenvectors," 
Lin. Alg. and Its Applic. 88/89, 715—732. 

C.D. Meyer and G.W. Stewart (1988). “Derivatives and Perturbations of Eigenvectors," 
SIAM J. Num. Anal. 25, 679-691. 

G.W. Stewart and G. Zhang (1991). “Eigenvalues of Graded Matrices and the Condition 
Numbers of Multiple Eigenvalues,” Numer. Math. 58, 703-712. 

J.-G. Sun (1992). “On Condition Numbers of a Nondefective Multiple Eigenvalue,” 
Numer. Math. 61, 265-276. 


The relationship between the eigenvalue condition number, the departure from normal- 
ity, and the condition of the eigenvector matrix is discussed in 


P. Henrici (1962). “Bounds for Iterates, Inverses, Spectral Variation and Fields of Values 
of Non-normal Matrices,” Numer. Math. 4, 24-40. 

P. Eberlein (1965). *On Measnres of Non-Normality for Matrices," Amer. Math. Soc. 
Monthly 72, 995-96. 

R.A. Smith (1967). “The Condition Numbers of the Matrix Eigenvalue Problem," Nu- 
mer. Math. 10 232-40. 

G. Loizou (1969). “Nonnormality and Jordan Condition Numbers of Matrices,” J. ACM 
16, 580-40. 

A. van der Slnis (1975). “Perturbations of Eigenvalues of Non- normal Matrices," Comm. 
ACM 18, 30-36. 


The paper by Henrici also contains a result similar to Theorem 7.2.3. Penetrating treat- 
ments of invariant subspace perturbation include 


T. Kato (1966). Perturbation Theory for Linear Operators, Springer-Verlag, New York. 

C. Davis and W.M. Kahan (1970). "The Rotation of Eigenvectors by a Perturbation, 
II,” SIAM J. Num. Anal. 7, 1—46. 

G.W. Stewart (1971). “Error Bounds for Approximate Invariant Subspaces of Closed 
Linear Operators,” SIAM. J. Num. Anal. 8, 796-808. 

G.W. Stewart (1973). “Error and Perturbation Bounds for Subspaces Associated with 
Certain Eigenvalue Problems,” SIAM Review 15, 727-64. 


Detailed analyses of the function sep(.,.) and the map X — AX + X AT are given in 


J. Varah (1979). “On the Separation of Two Matrices,” SIAM J. Num. Anal. 16, 
216-22. 

R. Byers and S.G. Nash (1987). “On the Singular Vectors of the Lyapunov Operator,” 
SIAM J. Alg. and Disce. Methods 8, 59-66. 


Gershgorin's Theorem can be used to derive a comprehensive perturbation theory. See 
Wilkinson (1965, chapter 2). The theorem itself can be generalized and extended in 
various ways; see 
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R.S. Varga (1970). “Minimal Gershgorin Sets for Partitioned Matrices,” SIAM J. Num. 
Anal. 7, 493-507. 


R.J. Johnston (1971). “Gershgorin Theorems for Partitioned Matrices,” Lin. Alg. and 
Its Applic. 4, 205-20. 


7.8 Power Iterations 


Suppose that we are given A € C"*" and a unitary Ug € (""". Assume 
that Householder orthogonalization (Algorithm 5.2.1) can be extended to 
complex matrices (it can) and consider the following iteration: 


Ty = UË AU 

for k = 1,2,... 
Ty-1 = UR, (QR factorization) (7.3.1) 
Tk =- RkUk 

end 


Since Tj = R,U, = UP(U,R&)U, = UE T, Uy it follows by induction 
that 


Ty = (UgU,-+-U,)4% A(UqU, - +- Up). (7.3.2) 


Thus, each T, is unitarily similar to A. Not so obvious, and what is the 
central theme of this section, is that the T, almost always converge to 
upper triangular form. That is, (7.3.2) almost always “converges” to a 
Schur decomposition of A. 

Iteration (7.3.1) is called the QR iteration, and it forms the backbone 
of the most effective algorithm for computing the Schur decomposition. 
In order to motivate the method and to derive its convergence properties, 
two other eigenvalue iterations that are important in their own right are 
presented first: the power method and the method of orthogonal iteration. 


7.3.1 The Power Method 


Suppose A € ("*” is diagonalizable, that X ^! AX = diag(A1,..., Àn) with 
X = [z,...,z4], and [Ail > [Ag] > --- > |A4]. Given a unit 2-norm 
qU € C”, the power method produces a sequence of vectors q( as follows: 


for k 2 1,2;... 
z(&) = Aq(k- 
qU) = z09 71) z% | (7.3.3) 
A) = [ql] Aq) 

end 


There is nothing special about doing a 2-norm normalization except that 
it imparts a greater unity on the overall discussion in this section. 
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Let us examine the convergence properties of the power iteration. If 
q = QT) + aata + <- + nn 


and a, Æ 0, then it follows that 


Since q(*) € span{ A*q )} we conclude that 


dist (span{q}, span(z:)) = O ( 


) 


If |A| > lA] > --- > [An] then we say that A; is a dominant eigenvalue. 
Thus, the power method converges if Àj is dominant and if qU has a 
component in the direction of the corresponding dominant eigenvector x. 

The behavior of the iteration without these assumptions is discussed in 
Wilkinson (1965, p.570) and Parlett and Poole (1973). 


and moreover, 
là - A | = o (| 
Ai 


Example 7.3.1 If 


—800 631 -—144 
then (A) = (10, 4, 3). Applying (7.3.3) with g(9 = [1, 0, 0]T we find 


-261 209  —49 
A-|-530 422 -9% 


1 
2 
3 
4 
5 
6 
7 
8 
9 


In practice, the usefulness of the power method depends upon the ratio 
|Aa|/|A1|, since it dictates the rate of convergence. The danger that q(9? is 
deficient in x; is a less worrisome matter because rounding errors sustained 
during the iteration typically ensure that the subsequent q*) have a com- 
ponent in this direction. Moreover, it is typically the case in applications 
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where the dominant eigenvalue and eigenvector are desired that an à priori 
estimate of zı is known. Normally, by setting g to be this estimate, the 
dangers of a small a, are minimized. 

Note that the only thing required to implement the power method is a 
subroutine capable of computing matrix-vector products of the form Aq. 
It is not necessary to store A in an n-by-n array. For this reason, the 
algorithm can be of interest when A is large and sparse and when there is 
a sufficient gap between |Ai| and [Ag]. 

Estimates for the error |A? — A,| can be obtained by applying the 
perturbation theory developed in the previous section. Define the vector 
r(9 = Aq) — \(*)q(*) and observe that (A + EEJ = Ag) where 
Elk) = —r(*)(g())4, Thus A(9 is an eigenvalue of A+ E) and 

[A9 — dy | ax (EC l2 _ ir Ile 
i s(Àı) 8(A1) 
If we use the power method to generate approximate right and left dominant 
eigenvectors, then it is possible to obtain an estimate of s(A1). In particular, 
if w*) is a unit 2-norm vector in the direction of (A¥)*w), then we can 


use the approximation 8(A1) a: | wk) afk) l. 


7.3.2 Orthogonal Iteration 


A straightforward generalization of the power method can be used to com- 
pute higher-dimensional invariant subspaces. Let r be a chosen integer 
satisfying 1 € r < n. Given an n-by-r matrix Qo with orthonormal 
columns, the method of orthogonal iteration generates a sequence of matri- 
ces {Qk} C C"** as follows: 


for k = 1,2,... 
Zk = AQk-1 (7.3.4) 
Qi Ry = Zk (QR factorization) 

end 


Note that if r = 1, then this is just the power method. Moreover, the 
sequence (Qe; } is precisely the sequence of vectors produced by the power 
iteration with starting vector q% = Qoen. 

In order to analyze the behavior of this iteration, suppose that 


QĦAQ =T = dag) + N pu|zpalz:e > lAl (735) 


is a Schur decomposition of A € C”™”, Assume that 1 € r < n and parti- 
tion Q, T, and N as follows: 


Tu Tia r 
— ^ T = 
| - UE | 0 Toz | nr 
T n-r 


(7.3.6) 
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Nu Nig r 
N = 
| 0 No n Tt 


r n—T 

If |A,| > |Ar+i], then the subspace D,(A) = ran(Qa) is said to be a dom- 
inant invariant subspace. It is the unique invariant subspace associated 
with the eigenvalues À1,..., Àp. The following theorem shows that with rea- 
sonable assumptions, the subspaces ran(Q;) generated by (7.3.4) converge 
to D,(A) at a rate proportional to |A,43/Ar|*. 


Theorem 7.3.1 Let the Schur decomposition of A € C"*" be given by 
(7.3.5) and (7.3.6) with n > 2. Assume that |M.| > |Ar+1| and that 0 > 0 
satisfies 
(1+8) > EN Ile- 
If Qo € €"** has orthonormal columns and 
d = dist(D,(A”),ran(Qo)) < 1, 
then the matrices Qk generated by (7.3.4) satisfy 
dist(D,(A), tan(Qx)) < 
(ppp ( , dale ) (Bal £N tle/Q my | 
v1 - d? sep(711, T22) Ar] — EN Lp 7. + 8) 

Proof. The proof is given in an appendix at the end of this section. O 


The condition d « 1 in Theorem 7.3.1 ensures that the initial Q matrix is 
not deficient in certain eigendirections: 


d«1 = D,(A”)* nran(Qo) = {0}. 
The theorem essentially says that if this condition holds and if @ is chosen 
large enough, then 
Ar+1 : 


Àr 
where c depends on sep(T11, 722) and A's departure from normality. Need- 
less to say, convergence can be very slow if the gap between |i, and |A-..1| 
is not sufficiently wide. 


dist( D, (A), ran(Qx)) < c 


Example 7.3.2 If (7.3.4) is applied to the matrix A in Example 7.3.1, with Qo = {e1, €2], 
we find: 


dist (D2(A), ran(Q)) 
.0052 
.0047 
.0039 
.0030 
.0023 
.0017 
.0013 


NFO 0) WR ee) 
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The error is tending to zero with rate (Ag/Az)* = (3/4)*. 


It is possible to accelerate the convergence in orthogonal iteration using 
a technique described in Stewart (1976). In the accelerated scheme, the 


approximate eigenvalue A9 satisfies 


k 
rri t=1r 


e 
lj il RS X 


(Without the acceleration, the right-hand side is |Ai41/ d:|*.) Stewart's algo- 
rithm involves computing the Schur decomposition of the matrices Q7 AQ, 
every so often. The method can be very useful in situations where A is 
large and sparse and a few of its largest eigenvalues are required. 


7.3.3 The QR Iteration 


We now “derive” the QR iteration (7.3.1) and examine its convergence. 
Suppose r = n in (7.3.4) and the eigenvalues of A satisfy 


[Aa] > Dal > --- > Anl. 
Partition the matrix Q in (7.3.5) and Qx in (7.3.4) as follows: 


Q — [q....,d4] O = [di9,... | 
If 
dist(D,(A”), span{q\”’, Tm qe } <1 i= ln (7.3.7) 
then it follows from Theorem 7.3.1 that 


. k k 
dist (span{q  osudi }, span(qi, ...,4i]) > 0 


for i = 1:n. This implies that the matrices Tą defined by 
T, = Qj AQk 


are converging to upper triangular form. Thus, it can be said that the 
method of orthogonal iteration computes a Schur decomposition provided 
the original iterate Qo € C”*” is not deficient in the sense of (7.3.7). 

The QR iteration arises naturally by considering how to compute the 
matrix 7, directly from its predecessor 74. ,;. On the one hand, we have 
from (7.3.4) and the definition of 7,.; that 


Ty-1 = Qi AQ. 1 = Qf? (AQr-1) = (QE 1Qx) Ry. 
On the other hand 
Tk = QF AQx = (QE AQ, AQE Qk) = R(Q Qi). 
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Thus, Tk is determined by computing the QR factorization of T,-, and 
then multiplying the factors together in reverse order. This is precisely 
what is done in (7.3.1). 


Example 7.3.3 If the iteration: 


for k= 1,2,... 
A=QR 
A= RQ 
end 


is applied to the matrix of Example 7.3.1, then the strictly lower triangular elements 
diminish as follows: 


O(lam])  O(Ja33 — O(las2D 


DONT otz- 


Note that a single QR iteration is an O(n?) calculation. Moreover, since 
convergence is only linear (when it exists), it is clear that the method is a 
prohibitively expensive way to compute Schur decompositions. Fortunately 
these practical difficulties can be overcome, as we show in 87.4 and $7.5. 


7.3.4 LR Iterations 


We conclude with some remarks about power iterations that rely on the LU 
factorization rather than the QR factorizaton. Let Go € ©””” have rank r. 
Corresponding to (7.3.4) we have the following iteration: 


for k = 1,2,... 
Zk = AGy. 4 (7.3.8) 
Zk = GR, (LU factorization) 

end 


Suppose r = n and that we define the matrices Tk by 
Tk = Gi AG. (7.3.9) 


It can be shown that if we set Lg = Go, then the Ty can be generated as 
follows: 
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To = Lg! ALo 

for k — 1 2 eee (7.3.10) 
Ty-i— LkRk (LU factorization) 
Tk = RyLy 

end 


Iterations (7.3.8) and (7.3.10) are known as treppeniteration and the LR 
iteration, respectively. Under reasonable assumptions, the Tk converge to 
upper triangular form. To successfully implement either method, it is nec- 
essary to pivot. See Wilkinson (1965, p.602). 


Appendix 


In order to establish Theorem 7.3.1 we need the following lemma which is 
concerned with bounding the powers of a matrix and its inverse. 


Lemma 7.3.2 Let QV AQ = T = D N be a Schur decomposition of 
A € C"*" where D is diagonal and N strictly upper triangular. Let à and 
p denote the largest and smallest eigenvalues of A in absolute value. If 
0 > 0 then for all k > 0 we have 


N k 
Itl s aser? (lai EXE) | nan 


If A is nonsingular and 0 > 0 satisfies (1 + 89)|u| > || N lp, then for all 
k > 0 we also have 
k 
| AMF jo < (1+0? PELIS 0000 (1342) 
lel ~IN e /Q + 9) 
Proof. For 0 > 0, define the diagonal matrix A by 
A = diag (1, (1 + 8), (14 6)?,...,(14+ 8)^-!) 


and note that x2(A) = (1+ 6)"—!. Since N is strictly upper triangular, it 
is easy to verify that || ANA"! ||, < || N ||p/(1 + 0). Thus, 


I| 4* |a 


| T* ll = [A7 (D + ANA-)EA |j 
K2(A) (Il D lla + || ANA? jla)“ 


k 
(1+ 8)n7! (bi + inte) : 


IA 


I^ 


On the other hand, if A is nonsingular and (1 + 0)|u| > || N Ilp, then 
|| AD^! NA^! |; <1 and using Lemma 2.3.3 we obtain 


A‘ lz = IT- l = lA" + AD! NA7)7! D71)^A||, 
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í I D^ qd : 
< mr abb ee 
m K2(A) er ADINA! =) 
< 


k 
n-1 1 
oe Bi zi wax) = 


Proof of Theorem 7.3.1 
It is easy to show by induction that A*Qo = Qx( Ry: +- R1). By substi- 
tuting (7.3.5) and 7.3.6) into this equality we obtain 


[8] [Jam 


where V, = Qu Qk and Wk = QU Qk. Using Lemma 7.1.5 we know that a 
matrix X € ('*(^7?) exists such that 


Ll X Y^ [Tn Tolle X | [320 0 
0 In-r 0 Tz 0 h.l 0 Tre 
and so 


Th 0 || Vo -XWo | _ | V - XW; E 
| 0 A Wo = W, (Rk -+ Ri). 


Below we establish that the matrix Vo—X Wp is nonsingular and this enables 
us to obtain the following expression: 


NN V 
Wy, = TEWo(Vo— XWo) TREL, -X] | W. |: 


Recalling the definition of distance between subspaces from §2.6.3, 
dist(D,(A),ran(Qx)) = I QF Qk ll2 = || We lle. 


Since 
| [X] lle < 1+ Xp 


we have 


dist(D,(A),ran(Qx)) < (7.3.13) 
ITH lla | (Vo — X Wo) lie I TAF lla 1 +I X Up) - 


To prove the theorem we must look at each of the four factors in the upper 
bound. 

Since sep(711, T22) is the smallest singular value of the linear transfor- 
mation ¢(X)= T1 X — XTn it readily follows from ¢(X) = —715 that 


| Tiz Ie 


e oU EE uc 
IX lr X sep(711, 122) 


(7.3.14) 
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Using Lemma 7.3.2, it can be shown that 


M Nick 
| T5 || < (+a (Beni T ie) (7.3.15) 
and 
: ps IN lle Y 
Iz lo x +8) (ps he) | (7.3.16) 


Finally, we turn our attention to the || (Vg — XWo)7' || factor. Note 
that 


Vo- XWo = Q2Q0- XQ$ Qo 


QH 
[Ir, Jp" le 


p y” 
lenan _ xH | Qo 


(I, + XX#)¥2(Z4Q,) 


where 


Z = [Qa Qs] | yn |a LAR 
- (Qa - QaXP)(, + XX"), 


The columns of this matrix are orthonormal. They are also a basis for 
D,(A#) because 


AP (Qa -Qa X”) = (Qa - QX” )TE. 


This last fact follows from the equation AF Q = QT. 
From Theorem 2.6.1 


d = dist(D, (AP), range(Qo)) = y 1 — e. (Z# Qo)? 
and since d « 1 by hypothesis, 
av (Z^ Qo) > 0. 
This shows that 
(Vo ~ XWo) = (Ip + XXH)!?(ZP Qu) 
is nonsingular and thus, 


| (Vo — X Wo)! [la I Gr + X X8)71? Ilall (Z4 Qo)? lla 


1/41 — d?. (7.3.17) 


IA IA 
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The theorem follows by substituting (7.3.14)-(7.3.17) into (7.3.13). à 


Problems 


P7.3.1 (a) Show that if X € ©”™" is nonsingular, then | A {|x = || X^! AX |{2 defines 
a matrix norm with the property that || AB ||, < | Aflxl| Bx. (b) Let A € C" and 
set p = max |A;|. Show that for any € > 0 there exists a nonsingular X € (D^ *" such that 
Ally = || X~'AX ||a € p +e. Conclude that there is a constant M such that || A* Iz 
< M(p + €)* for all non-negative integers k. (Hint: Set X = Q diag(1,a,...,a^71) 
where QF AQ = D+ N is A's Schur decomposition.) 

P7.3.2 Verify that (7.3.10) calculates the matrices Tẹ defined by (7.3.9). 


P7.3.3 Suppose A € C""" is nonsingular and that Qo € ©"? has orthonormal columns. 
The following iteration is referred to as inverse orthogonal iteration. 


for k = 1,2,... 
Solve AZ, = Qx—1ı for Z, c ©"? 
Ze = Qk Rk (QR factorization) 
end 


Explain why this iteration can usually be used to compute the p smallest eigenvalues 
of A in absolute value. Note that to implement this iteration it is necessary to be able 
to solve linear systems that involve A. When p = 1, the method is referred to as the 
inverse power method. 


P7.3.4 Assume A € R”*?” has eigenvalues À1,...,Àn that satisfy 
à= Aq = Ag = Azza > [| > -:- > [Anl 
where À is positive. Assume that A has two Jordan blocks of the form. 


à 1 

0 àj’ 
Discuss the convergence properties of the power method when applied to this matrix. 
Discuss how the convergence might be accelerated. 


Notes and References for Sec. 7.3 


À detailed, practical discussion of the power method is given in Wilkinson (1965, chapter 
10). Methods are discussed for accelerating the basic iteration, for calculating nondomi- 
nant eigenvalues, and for handling complex conjugate eigenvalue pairs. The connections 
among the various power iterations are discussed in 


B.N. Parlett and W.G. Poole (1973). “A Geometric Theory for the QR, LU, and Power 
Iterations,” SIAM J. Num. Anal. 10, 389-412. 


The QR iteration was concurrently developed in 


J.G.F. Francis (1961). "The QR Transformation: A Unitary Analogue to the LR Trans- 
formation," Comp. J. 4, 265-71, 332-34. 

V.N. Kublanovskaya (1961). “On Some Algorithms for the Solution of the Complete 
Eigenvalue Problem,” USSR Comp. Math. Phys. 3, 637-57. 


As can be deduced from the title of the first paper, the LR iteration predates the QR 
iteration. The former very fundamental algorithm was proposed by 
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H. Rutishauser (1958). “Solution of Eigenvalue Problems with the LR Transformation,” 
Nat. Bur. Stand. App. Math. Ser. 49, 47-81. 
B.N. Parlett (1995). “The New qd Algorithms,” ACTA Numerica 5, 459-491. 


Numerous papers on the convergence of the QR iteration have appeared. Several of these 
are 


J.H. Wilkinson (1965). “Convergence of the LR, QR, and Related Algorithms,” Comp. 
J. 8, 77-84. 

B.N. Parlett (1965). “Convergence of the Q-R Algorithm,” Numer. Math. 7, 187-93. 
(Correction in Numer. Math. 10, 163-64.) 

B.N. Parlett (1966). “Singular and Invariant Matrices Under the QR Algorithm,” Math. 
Comp. 20, 611-15. 

B.N. Parlett (1968). “Global Convergence of the Basic QR Algorithm on Hessenberg 
Matrices,” Math. Comp. 22, 803-17. 


Wilkinson (AEP, chapter 9) also discusses the convergence theory for this important 
algorithm. 

Deeper insight into the convergence of the QR algorithm and its connection to other 
important algorithms can be attained by reading 


D.S. Watkins (1982). “Understanding the QR Algorithm,” SIAM Review 24, 421—440. 

T. Nanda (1985). “Differential Equations and the QR Algorithm,” SIAM J. Numer. 
Anal. 22, 310-321. 

D.S. Watkins (1993). “Some Perspectives on the Eigenvalue Problem,” SIAM Review 
35, 430-471. 


The following papers are concerned with various practical and theoretical aspects of si- 
multaneous iteration: 


H. Rutishauser (1970). “Simultaneous Iteration Method for Symmetric Matrices,” Nu- 
mer. Math. 16, 205-23. See also (Wilkinson and Reinsch(1971, pp. 284-302. 

M. Clint and A. Jennings (1971). “A Simultaneous Iteration Method for the Unsym- 
metric Eigenvalue Problem,” J. inst. Math. Applic. 8, 111-21. 

A. Jennings and D.R.L. Orr (1971). “Application of the Simultaneous Iteration Method 
to Undamped Vibration Problems," Inst. J. Numer. Math. Eng. 3, 13-24. 

A. Jennings and W.J. Stewart (1975). “Simultaneous Iteration for the Partial Eigenso- 
lution of Real Matrices,” J. Inst. Math. Applic. 15, 351-62. 

G.W. Stewart (1975). “Methods of Simultaneous Iteration for Calculating Eigenvectors 
of Matrices,” in Topics in Numerical Analysis II , ed. John J.H. Miller, Academic 
Press, New York, pp. 185-96. 

G.W. Stewart (1976). “Simultaneous Iteration for Computing Invariant Subspaces of 
Non-Hermitian Matrices,” Numer. Math. 25, 123-36. 


See also chapter 10 of 
A. Jennings (1977). Matriz Computation for Engineers and Scientists, John Wiley and 
Sons, New York. 


Simultaneous iteration and the Lanczos algorithm (cf. Chapter 9) are the principal meth- 
ods for finding a few eigenvalues of a general sparse matrix. 
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7.4 The Hessenberg and Real Schur Forms 


In this and the next section we show how to make the QR iteration (7.3.1) 
a fast, effective method for computing Schur decompositions. Because the 
majority of eigenvalue/invariant subspace problems involve real data, we 
concentrate on developing the real analog of (7.3.1) which we write as fol- 
lows: 


Ho = Ud AU 

for k = 1,2,... 
A,-) = Uk Rk (QR factorization) (7.4.1) 
A, = RU: 

end 


Here, A € R”*”, each Up € IR"*" is orthogonal, and each R; € R"™” is 
upper triangular. A difficulty associated with this real iteration is that the 
Hy can never converge to strict, “eigenvalue revealing,” triangular form 
in the event that A has complex eigenvalues. For this reason, we must 
lower our expectations and be content with the calculation of an alternative 
decomposition known as the real Schur decomposition. 

In order to compute the real Schur decomposition efficiently we must 
carefully choose the initial orthogonal similarity transformation Uo in (7.4.1). 
In particular, if we choose Ug so that Ho is upper Hessenberg, then the 
amount of work per iteration is reduced from O(n?) to O(n?). The initial 
reduction to Hessenberg form (the Up computation) is a very important 
computation in its own right and can be realized by a sequence of House- 
holder matrix operations. 


7.4.1 The Real Schur Decomposition 


A block upper triangular matrix with either 1-by-1 or 2-by-2 diagonal blocks 
is upper qguasi-triangular. The real Schur decomposition amounts to a real 
reduction to upper quasi-triangular form. 


Theorem 7.4.1 (Real Schur Decomposition) If A € IR"*", then there 
exists an orthogonal Q € IR^*" such that 


RI 1 Rio Rim 

0 Ro -> Rom 
QaQ=] . . . . (7.4.2) 

0 0 -:+ Ramm 


where each Ri; is either a 1-by-1 matriz or a 2-by-2 matriz having complex 
conjugate eigenvalues. 


Proof. The complex eigenvalues of A must come in conjugate pairs, since 
the characteristic polynomial det(zI — A) has real coefficients. Let k be 
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the number of complex conjugate pairs in \(A). We prove the theorem by 
induction on k. Observe first that Lemma 7.1.2 and Theorem 7.1.3 have 
obvious real analogs. Thus, the theorem holds if k = 0. Now suppose that 
k > 1. If à = y+ ip € A(A) and u Æ 0, then there exist vectors y and z in 
IR" (z Æ 0) such that A(y - iz) = (y+ in)(y + iz), i.e., 


Aty £17 D 2]| 2 4], 


The assumption that z XX 0 implies that y and z span a two-dimensional, 
real invariant subspace for A. It then follows from Lemma 7.1.2 that an 
orthogonal U € IR"*" exists such that 


T |. | Ti Tw 2 
DAD S | 0 Too n—2 
2 n—2 


where A(Ti1) = (4, A) By induction, there exists an orthogonal Ü so 
UTTU has the required structure. The theorem follows by setting Q = U 
diag( I>, U). m 


The theorem shows that any real matrix is orthogonally similar to an upper 
quasi-triangular matrix. It is clear that the real and imaginary part of the 
complex eigenvalues can be easily obtained from the 2-by-2 diagonal blocks. 


7.4.2 A Hessenberg QR Step 


We now turn our attention to the speedy calculation of a single QR step 
in (7.4.1). In this regard, the most glaring shortcoming associated with 
(7.4.1) is that each step requires a full QR factorization costing O(n*) flops. 
Fortunately, the amount of work per iteration can be reduced by an order of 
magnitude if the orthogonal matrix Up is judiciously chosen. In particular, 
if UT AUo = Ho = (hij) is upper Hessenberg (hi; = 0, i > j +1), then each 
subsequent H, requires only O(n?) flops to calculate. To see this we look at 
the computations H = QR and H, = RQ when H is upper Hessenberg. As 
described in §5.2.4, we can upper triangularize H with a sequence of n — 1 
Givens rotations: QTH = G1_,-..GTH = R. Here, G; = G(i,i + 1,4). 
For the n = 4 case there are three Givens premultiplications: 


X x X X X x X X X X x X 
X X X Xx 0 x x x 0 x x x 
0x x x|  |0 x x x|  |0 0 x x 
0 0 x x 0 0 x x 0 0 x x 

x X x x 

_,|9 * x x 

0 0 x x 

0 0 0 x 
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See Algorithm 5.2.3. 
The computation RQ = R(G,---G,_1) is equally easy to implement. 
In the n = 4 case there are three Givens post-multiplications: 


X X X X X X X X X X X X 
0 x x x ET X X X X x X X 
0 0 x x 00x xl +=|0x x x 
0 0 0 x 0 0 0 x 0 0 0 x 

X X Xx X 

= X X x X 

0 x x x 

0 0 x x 

Overall we obtain the following algorithm: 


Algorithm 7.4.1 If H is an n-by-n upper Hessenberg matrix, then this 
algorithm overwrites H with H} = RQ where H = QR is the QR factor- 
ization of H. 
for k=1:n—-1 
[ c(k) , a(k) ] = givens(H(k, k), H(k 1, k)) 


T 
Hk Ln) = | c(K) 42 | H(k:k 4 1, k:n) 


—s(k) c(k) 
end 
for k21m-1 
scere icenen [ 18 28 
end 


Let Gk = G(k,k + 1,0,) be the kth Givens rotation. It is easy to confirm 
that the matrix Q = G1 ---G,,_, is upper Hessenberg. Thus, RQ = H is 
also upper Hessenberg. The algorithm requires about 6n? flops and thus is 
an order-of-magnitude quicker than a full matrix QR step (7.3.1). 


Example 7.4.1 If Algorithm 7.4.1 is applied to: 


3 1 2 
H = 4 2 3], 
0 ol 1 


0 1 0 0 
0 |, Gz = | 0 .9996 -—.0249 |, 
1 


O .0249 -9996 


then 


and 


.3200 .4856 —2.1796 


4.7600 — —2.5442 5.4653 
.0000 .0263 1.0540 
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7.4.3 The Hessenberg Reduction 
It remains for us to show how the Hessenberg decomposition 
U AU, H UU =I (7.4.3) 


can be computed. The transformation Ug can be computed as a product 
of Householder matrices P,,..., P, 5. The role of Py is to zero the kth 
column below the subdiagonal. In the n — 6 case, we have 


X X X X X X X X X X X X 
X X X X X X X X X X X X 
X X X X X X B, 0 x x Xx x x Pa 
X X X X X X 0 x x x x x 
X X X X X X 0 x x Xx Xx x 
X X X X X X 0 x X xX Xx x 
X X X X X X X X X X X X 
X X X X X X X X X X X X 
Ü x x x x x P, 0 x x x x x Pa 
0 0 x x x x c 0 0 x x x x 
0 0 x x x x 0 0 0 x x x 
0 0 x x x x 0 0 0 x x x 
X X X X X X 
X X X X X X 
0 x x xX x x 
0 0 x Xx x x 
0 0 O0 x x x 
0 0 0 0 x x 


In general, after k — 1 steps we have computed k — 1 Householder matrices 
P,,..., Py 1 such that 


(Pi --- PAY A(P --- Pei) = 
k—1 1 n-k 


is upper Hessenberg through its first k — 1 columns. Suppose Py is an order 


n — k Householder matrix such that PL Baa is a multiple of "EE If P, = 
diag(I,., P,), then 


" Bi Biz Bis Px 
(P Pk) A(Pi--- Pe) = | Bar Ba BasPh 
0  PFBss PkB33Pk 
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is upper Hessenberg through its first k columns. Repeating this for k = 
l:n — 2 we obtain 


Algorithm 7.4.2 (Householder Reduction to Hessenberg Form) 
Given A € IR?*^, the following algorithm overwrites A with H = Ug AUo 
where H is upper Hessenberg and Up is product of Householder matrices. 


for k 2 1:n - 2 
[v, 8] = house(A(k + 1:n, k)) 
A(k + 1:n, kin) = (I — Bvv T) A(k + 1:n, k:n) 
A(1:n, k + 1:n) = A(1:in, k + 1:n)(1 — vu?) 
end 


This algorithm requires 10n?/3 flops. If Up is explicitly formed, an addi- 
tional 4n? /3 flops are required. The kth Householder matrix can be repre- 
sented in A(k + 2:n, k). See Martin and Wilkinson (1968d) for a detailed 
description. 

The roundoff properties of this method for reducing A to Hessenberg 
form are very desirable. Wilkinson (1965, p.351) states that the computed 
Hessenberg matrix É satishes H = Q?(A + E)Q, where Q is orthogonal 
and | E ||p € cn?ull A ||p with c a small constant. 


Example 7.4.2 If 


then 


7.4.4 Level-3 Aspects 


The Hessenberg reduction (Algorithm 7.4.2) is rich in level-2 operations: 
half gaxpys and half outer product updates. We briefly discuss two methods 
for introducing level-3 computations into the process. 

The first approach involves a block reduction to block Hessenberg form 
and is quite straightforward. Suppose (for clarity) that n = rN and write 


A= An Ár T 
Hn Ao A22 n—r 


r n-r 


Suppose that we have computed the QR. factorization A2; = Q, Ry and 
that Qj, is in WY form. That is, we have W,,Yi € JR(^7?*7 such that 
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1 


ı = I — W, YT. (See §5.2.2 for details.) If Q) = diag(/,, Q|) then 


T ET Án A12Qi 
G1 AQ = | Ry one, | | 


Notice that the updates of the (1,2) and (2,2) blocks are rich in level-3 
operations given that Q, is in WY form. This fully illustrates the overall 
process as QT AQ, is block upper Hessenberg through its first block column. 
We next repeat the computations on the first r columns of QT A;5Q,. After 
N — 2 such steps we obtain 


Ay, Ay + vee An 

Ha Ha 39 "T Hon 
HAs | 3 | : 

0 0 -— Hw,e-1 HNN 


where each Hi; is r-by-r and Ug = Q1---Qw.-» with with each Q; in WY 
form. The overall algorithm has a level-3 fraction of the form 1 - O(1/N). 

Note that the subdiagonal blocks in H are upper triangular and so the 
matrix has lower bandwidth p. It is possible to reduce H to actual Hessen- 
berg form by using Givens rotations to zero all but the first subdiagonal. 

Dongarra, Hammarling and Sorensen (1987) have shown how to proceed 
directly to Hessenberg form using a mixture of gaxpy's and level-3 updates. 
Their idea involves minimal updating after each Householder transforma- 
tion is generated. For example, suppose the first Householder P; has been 
computed. To generate Pz we need just the second column of P, AP, not 
the full outer product update. To generate P we need just the 3rd col- 
umn of PP, AP, Pz, etc. In this way, the Householder matrices can be 
determined using only gaxpy operations. No outer product updates are 
involved. Once a suitable number of Householder matrices are known they 
can be aggregated and applied in a level-3 fashion. 


7.4.5 Important Hessenberg Matrix Properties 


The Hessenberg decomposition is not unique. If Z is any n-by-n orthogonal 
matrix and we apply Algorithm 7.4.2 to ZT AZ, then QT AQ = H is upper 
Hessenberg where Q = ZUo. However, Qe; = Z(Uoei) = Ze, suggesting 
that H is unique once the first column of Q is specified. This is essentially 
the case provided H has no zero subdiagonal entries. Hessenberg matrices 
with this property are said to be unreduced. Here is a very important 
theorem that clarifies the uniqueness of the Hessenberg reduction. 


Theorem 7.4.2 ( Implicit Q Theorem ) Suppose Q = [q1....,94 ] and 
V = [vi,..., v4 ] are orthogonal matrices with the property that both QT AQ 
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= H and V! AV =G are upper Hessenberg where A € R"™". Let k denote 
the smallest positive integer for which hy, 4, = 0, with the convention that 
k =n if H is unreduced. If qq = vi, then qi = +u; and hy i-1| = [95-1 
for i = 2:k. Moreover, if k <n, then gy 1, = 0. 


Proof, Define the orthogonal matrix W = [wi,...,w4] = VTQ and 
observe that GW = WH. By comparing column i — 1 in this equation for 
i — 2:k we see that 


i-1 
Avi-1wi = Guj-1 — 5 hji}. 


j=1 


Since w = e1, it follows that [ w1, -.., wg | is upper triangular and thus w; 
= +J],(:,t) = +e; for i = 2:k. Since w; = VTq; and higas wi Gwi- it 
follows that v; = +q; and 


Ihi i1 = lad Aqi-a| = lu? Avil = [gi i1] 
for i = 2:k. If k < n, then 


T T T 


k k 

T T 

Chay X hikWe; = ) hiejr,j€; = 0.0 
i=l i=] 


The gist of the implicit Q theorem is that if QT AQ = H and ZT AZ = G 
are each unreduced upper Hessenberg matrices and Q and Z have the same 
first column, then G and H are “essentially equal” in the sense that G = 
D !HD where D = diag(+1,..., +1). 

Our next theorem involves a new type of matrix called a Krylov ma- 
trix. If A € IR"*" and v € R”, then the Krylov matrix K(A, v, j) € IR?*? 
is defined by 

K(A,v, j) 2 |v, Av,---, Ay]. 
It turns out that there is a connection between the Hessenberg reduction 
QT AQ = H and the QR factorization of the Krylov matrix K (A, Q(:, 1), n). 


Theorem 7.4.3 Suppose Q € IR"*" is an orthogonal matriz and A € R"™”. 
Then QT AQ = H is an unreduced upper Hessenberg matriz if and only if 
QTK(A,Q(:,1),n) = R is nonsingular and upper triangular. 


Proof. Suppose Q € IR**" is orthogonal and set H = QT AQ. Consider 
the identity 
QT K(A,Q(., 1),n) = | ei. He},.. . ,H" Ie, ] = R. 


If H is an unreduced upper Hessenberg matrix, then it is clear that R is 
upper triangular with rj; = hojha2::: hi ;-1 for i = 2:n. Since ry, = 1 it 
follows that R is nonsingular. 
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To prove the converse, suppose R is upper triangular and nonsingular. 
Since R(:,k + 1) = HR(:,k) it follows that H(:, k) € span( e1,...,€41 }. 
This implies that H is upper Hessenberg. Since Tnn = Aaihaz::: Ann-1 # 0 
it follows that H is also unreduced. O 


Thus, there is more or less a correspondence between nonsingular Krylov 
matrices and orthogonal similarity reductions to unreduced Hessenberg 
form. 

Our last result concerns eigenvalues of an unreduced upper Hessenberg 
matrix. 


Theorem 7.4.4 If ^ is an eigenvalue of an unreduced upper Hessenberg 
matric H c IR"*", then its geometric multiplicity is one. 


Proof. For any A € C we have rank(A — AI) > n — 1 because the first 
n — 1 columns of H — AI are independent. O 


7.4.6 Companion Matrix Form 


Just as the Schur decomposition has a nonunitary analog in the Jordan 
decomposition, so does the Hessenberg decomposition have a nonunitary 
analog in the companion matrir decomposition. Let x € IR" and suppose 
that the Krylov matrix K — K(A,z,n) is nonsingular. If c — c(0:n — 1) 
solves the linear system Kc = —A"z, then it follows that AK = KC where 
C has the form: 


0 0 0 -co 
1 0 -aà 

C=]|0 1 0 —e (7.4.4) 
00 -- 1 -e&a 


The matrix C is said to be à companion matriz. Since 
det(z/ - C) = +z 4 6 i27 142" 


it follows that if K is nonsingular, then the decomposition K-1AK = C 
displays A’s characteristic polynomial. This, coupled with the sparseness 
of C, has led to "companion matrix methods" in various application areas. 
These techniques typically involve: 


e Computing the Hessenberg decomposition Ud AU, = H. 
e Hoping H is unreduced and setting Y = [e1, Hei, ..., H"^1ej]. 
e Solving YC — HY for C. 
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Unfortunately, this calculation can be highly unstable. A is similar to an 
unreduced Hessenberg matrix only if each eigenvalue has unit geometric 
multiplicity. Matrices that have this property are called nonderogatory. It 
follows that the matrix Y above can be very poorly conditioned if A is close 
to a derogatory matrix. 

A full discussion of the dangers associated with companion matrix com- 
putation can be found in Wilkinson (1965, pp. 405 ff.). 


7.4.7 Hessenberg Reduction Via Gauss Transforms 


While we are on the subject of nonorthogonal reduction to Hessenberg 
form, we should mention that Gauss transformations can be used in lieu 
of Householder matrices in Algorithm 7.4.2. In particular, suppose permu- 
tations II;,..., Il; .1 and Gauss transformations M,,..., Mi; have been 
determined such that 


(My iy 1 MII)A(M, ME +++ Mil) = B 


where 
By Bi; Bis k—1 
B- Bo B22 Bay 1 
0 Byz B33 n-k 


k-1 I n-k 


is upper Hessenberg through its first k — 1 columns. A permutation II, 
of order n — k is then determined such that the first element of II, B32 is 
maximal in absolute value. This makes it possible to determine a stable 
Gauss transformation M, = I — z,e! also of order n — k, such that all but 
the first component of M,(II, B32) is zero. Defining II, = diag(J,,TI,) and 
Mp = dieg(Ix, M,), we see that 


(MIT; - -- Mil) ACM, I -- . Mil)! = 
Bi Bis Boa Ñ Mj! 
B B22 Bz Ù M, ' 
0 Mjl,Bi;  MillBs4llT My’ 


is upper Hessenberg through its first k columns. Note that Mj | = I+ zke! 
and so some very simple rank-one updates are involved in the reduction. 
A careful operation count reveals that the Gauss reduction to Hessen- 
berg form requires only half the number of flops of the Householder method. 
However, as in the case of Gaussian elimination with partial] pivoting, there 
is a (fairly remote) chance of 2" growth. See Businger (1969). Another dif- 
ficulty associated with the Gauss approach is that the eigenvalue condition 
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numbers — the s(A)~! — are not preserved with nonorthogonal similarity 
transformations and this complicates the error estimation process. 


Problems 


P7.4.1 Suppose A € R?*" and z € R”. Give a detailed algorithm for computing an 
orthogonal Q such that QT AQ is upper Hessenberg and Q? z is a multiple of e1. (Hint: 
Reduce z first and then apply Algorithm 7.4.2.) 


P7.4.2 Specify a complete reduction to Hessenberg form using Gauss transformations 
and verify that it only requires 5n?/3 flops. 


P7.4.3 In some situations, it is necessary to solve the linear system (A + zI)z = b for 
many different values of z € R and b € R”. Show how this problem can be efficiently 
and stably solved using the Hessenberg decomposition. 


P7.4.4 Give a detailed algorithm for explicitly computing the matrix Uo in Algorithm 
7.4.2. Design your algorithm so that H is overwritten by Uo. 


P7.4.5 Suppose H € R"** is an unreduced upper Hessenberg matrix. Show that there 
exists a diagonal matrix D such that each subdiagonal element of D^! HD is equal to 
one. What is «2(D)? 


P7.4.6 Suppose W, Y c R'*? and define the matrices C and B by 


| [w -Y 
C = W riY, pe) Ww 


Show that if À € A(C) is real, then A € A(B). Relate the corresponding eigenvectors. 


5 : | is a real matrix having eigenvalues À + ip, where p is 


nonzero. Give an algorithm that stably determines c = cos(@) and s = sin(@) such that 
c a} fw z c s|] |^ B8 
—8 € y 2 -s c|) |a à 


P7.4.8 Suppose (A, z) is a known eigenvalue-eigenvector pair for the upper Hessenberg 

matrix H € R”*”. Give an algorithm for computing an orthogonal matrix P such that 
A wt 

0 Hi 


P7.4.7 Suppose A = 


where af = —p?. 


PTHP = | 


where H; € Rí^- 0X(^-!) is upper Hessenberg. Compute P as a product of Givens 
rotations. 

P7.4.9 Suppose H € R”*” has lower bandwidth p. Show how to compute Q € R^**, 
a product of Givens rotations, such that QT HQ is upper Hessenberg. How many flops 
are required? 

P7.4.10 Show that if C is a companion matrix with distinct eigenvalues A1,...,AÀn, 
then VCV-! = diag(A1,..., Àn) where 


13 ce Ap 

125 AM 
Ve|. . 

I X. 55 ARCU 


Notes and References for Sec. 7.4 


The real Schur decomposition was originally presented in 
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F.D. Murnaghan and A. Wintner (1931). ‘A Canonical Form for Real Matrices Under 
Orthogonal Transformations,” Proc. Nat. Acad. Sct. 17, 417-20. 


A thorough treatment of the reduction to Hessenberg form is given in Wilkinson (1965, 
chapter 6), and Algol procedures for both the Householder and Gauss methods appear in 


R.S. Martin and J.H. Wilkinson (1968). “Similarity Reduction of a General Matrix to 
Hessenberg Form,” Numer. Math. 12, 349-68. See also Wilkinson and Reinsch 
(1971, pp.339—58). 


Fortran versions of the Algol procedures in the last reference are in Eispack. 
Givens rotations can also be used to compute the Hessenberg decomposition. See 


W. Rath (1982). “Fast Givens Rotations for Orthogonal Similarity,” Numer. Math. 40, 
47-56, 


The high performance computation of the Hessenberg reduction is discussed in 


J.J. Dongarra, L. Kaufman, and S. Hammarling (1986). “Squeezing the Most Out of 
Eigenvalue Solvers on High Performance Computers,” Lin. Alg. and Its Applic. 77, 
113-136. 

J.J. Dongarra, S. Hammarling, and D.C. Sorensen (1989). “Block Reduction of Matrices 
to Condensed Forms for Eigenvalue Computations,” JACM 27, 215-227. 

M.W. Berry, J.J. Dongarra, and Y. Kim (1995). “A Parallel Algorithm for the Reduction 
of a Nonsymmetric Matrix to Block Upper Hessenberg Form,” Parallel Computing 
21, 1189-1211. 


The possibility of exponential growth in the Gauss transformation approach was first 
pointed out in 


P. Businger (1969). “Reducing a Matrix to Hessenberg Form," Math. Comp. 23, 819-21. 


However, the algorithm should be regarded in the same light as Gaussian elimination 
with partial pivoting—stable for all practical purposes. See Eispack, pp. 56-58. 


Aspects of the Hessenberg decomposition for sparse matrices are discussed in 


LS. Duff and J.K. Reid (1975). “On the Reduction of Sparse Matrices to Condensed 
Forms by Similarity Transformations,” J. Inst. Math. Applic. 15, 217-24. 


Once an eigenvalue of an unreduced upper Hessenberg matrix is known, it is possible to 
zero the last subdiagonal entry using Givens similarity transformations. See 


P.A. Businger (1971). “Numerically Stable Deflation of Hessenberg and Symmetric Tridi- 
agonal Matrices, BIT 11, 262-70. 


Some interesting mathematical properties of the Hessenberg form may be found in 


B.N. Parlett (1967). “Canonical Decomposition of Hessenberg Matrices,” Math. Comp. 
21, 223-27. 

Y. Ikebe (1979). “On Inverses of Hessenberg Matrices,” Lin. Alg. and Its Applic. 24, 
93-97. 


Although the Hessenberg decomposition is largely appreciated as a “front end” decom- 
Position for the QR. iteration, it is increasingly popular as a cheap alternative to the 
more expensive Schur decomposition in certain problems. For a sampling of applications 
where it has proven to be very useful, consult 


W. Enright (1979). “On the Efficient and Reliable Numerical Solution of Large Linear 
Systems of O.D.E.’s,” IEEE Trans. Auto. Cont. AC-24, 905-8. 
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Algorithms and Theory in Filtering and Control , D.C. Sorensen and R.J. Wets 
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The advisability of posing polynomial root problems as companion matrix eigenvalue 
problem is discussed in 


K.-C. Toh and L.N. Trefethen (1994). “Pseudozeros of Polynomials and Pseudospectra 
of Companion Matrices," Numer. Math. 68, 403—425. 
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7.0 The Practical QR Algorithm 


We return to the Hessenberg QR iteration which we write as follows: 


H = UJ AUS (Hessenberg Reduction) 


for k = 1,2,... 
H=UR (QR factorization) (7.5.1) 
H = RU 

end 


Our aim in this section is to describe how the H’s converge to upper quasi- 
triangular form and to show how the convergence rate can be accelerated 
by incorporating shifts. 


7.5.1 Deflation 


Without loss of generality we may assume that each Hessenberg matrix H 
in (7.5.1) is unreduced. If not, then at some stage we have 


— | Air Hm p 
Bm | 0 E T, —p 


p n-p 


where 1 € p « n and the problem decouples into two smaller problems 
involving Hj, and H22. The term deflation is also used in this context, 
usually when p — n — 1 or n — 2. 

In practice, decoupling occurs whenever a subdiagonal entry in H is 
suitably small. For example, in Eispack if 


IRptip| € cu(|Rppl + ihp+ip+1l) (7.5.2) 
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for a small constant c, then hp;1,p is “declared” to be zero. This is justified 
since rounding errors of order u|| H || are already present throughout the 
matrix. 


7.5.2 The Shifted QR Iteration 


Let jj € R and consider the iteration: 


H = Ud AUy (Hessenberg Reduction) 


for k= 102. 
Determine a scalar j. 
H -uI -UR (QR factorization) (7.5.3) 
H = RU + ul 

end 


The scalar p is referred to as a shift. Each matrix H generated in (7.5.3) 
is similar to A, since RU + pI = UT(UR + uI)U = UT HU. If we order 
the eigenvalues A; of A so that 


Iu — uz 2 [An — al, 


and p is fixed from iteration to iteration, then the theory of §7.3 says that 
the pth subdiagonal entry in H converges to zero with rate 
k 
Apt ~H 
Àp— H 


Of course, if A = A541, then there is no convergence at all. But if, for 
example, u is much closer to A, than to the other eigenvalues, then the 
zeroing of the (n,n — 1} entry is rapid. In the extreme case we have the 
following: 


Theorem 7.5.1 Let p be an eigenvalue of an n-by-n unreduced Hessenberg 
matriz H. If H = RU + pl, where H — pl = UR is the QR factorization 
of H — ul, then hnn-1 = 0 and hnn = p. 


Proof. Since H is an unreduced Hessenberg matrix the first n — 1 columns 
of H — uI are independent, regardless of p. Thus, if UR = (H — pI) is the 
QR factorization then rj; # 0 for i = 1:n — 1. But if H — pl is singular then 
Tap Tnn =O. Thus, fan = 0 and H(n,:) =[0,..., 0, 4]. Bl 


The theorem says that if we shift by an exact eigenvalue, then in exact 
arithmetic deflation occurs in one step. 


Example 7.5.1 If 
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then 6 € A(H). If UR = H — 61 is the QR factorization, then H = RU + 6I is given by 


g 8.5384 —3.7313 —1.0090 
H = | 0.6343 5.4615 1.3867 
0.0000 0.0000 6.0000 


7.5.3 The Single Shift Strategy 


Now let us consider varying 4 from iteration to iteration incorporating new 
information about (A) as the subdiagonal entries converge to zero. A 
good heuristic is to regard hnn as the best approximate eigenvalue along 
the diagonal. If we shift by this quantity during each iteration, we obtain 
the szngle-shift QR iteration: 


for k = 1,2,... 
p= H(n, n) 
H -uI -UR (QR Factorization) (7.5.4) 
H = RU + H 

end 


If the (n,n — 1) entry converges to zero, it is likely to do so at a quadratic 
rate. To see this, we borrow an example from Stewart (1973, p. 366). 
Suppose H is an unreduced upper Hessenberg matrix of the form 


y 

Il 
o2oox x 
oOo xX xX Xx 
OX xX xX xX 
mx xX X X 
x XXX 


Ann 


and that we perform one step of the single-shift QR algorithm: UR = 
H —ha41, H = RU + hnnd. After n —2 steps in the reduction of H — hl 
to upper triangular form we obtain a matrix with the following structure: 


y 

Il 
ooo oc x 
coc x X 
oo XK X X 
na me X X X 
Oc-X X X 


It is not hard to show that the (n,n — 1) entry in H = RU + hpnl is 
given by —e2b/(e? + a”). If we assume that € < a, then it is clear that 
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the new (n,n — 1) entry has order e*, precisely what we would expect of a 
quadratically converging algorithm. 


l 2 3 
H = 4 o 6 
0 .001 7 


and UR = H — 71 is the QR factorization, then H = RU + 71 is given by 
| —0.5384 1.6908 0.8351 | 


Example 7.5.2 If 


H zm 0.3076 6.5264  —6.6555 
0.0000 2.10-5 7.0119 


Near-perfect shifts as above almost always ensure a small An,n-1. However, this is just 
a heuristic. There are examples in which hanc is & relatively large matrix entry even 
though omin(H — pI) = u. 


1.5.4 The Double Shift Strategy 


Unfortunately, difficulties with (7.5.4) can be expected if at some stage the 
eigenvalues aj and ag of 


G = | hmm hmn 


h h | m-n-1 (7.5.5) 


are complex for then Ann would tend to be a poor approximate eigenvalue. 
A way around this difficulty is to perform two single-shift QR steps in 
succession using a, and az as shifts: 


H-—aj = U,R, 

A, RU, + ai (7.5.6) 
Hi — agl U2 Re 

H, = R2U2+aal 


Il 


These equations can be manipulated to show that 
(U,Uj)(RaBR,) = M (7.5.7) 
where M is defined by 
M = (H —aiIY(H — al). (7.5.8) 
Note that M is a real matrix even if G’s eigenvalues are complex since 
M = H? —sH «tl 


where 
S — 01 + à2 = Riin + Ran = trace(G) € R 
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and 
t = ajaz = hym han — hg nha = det(G) € IR. 


Thus, (7.5.7) is the QR factorization of a real matrix and we may choose 
U, and Us so that Z = U, U2 is real orthogonal. It then follows that 


Hz = UB HU, = UP (UP HU;)U, = (U Ua)" H(U,U2) = ZT HZ. 


is real. 
Unfortunately, roundoff error almost always prevents an exact return to 
the real field. A real Hə could be guaranteed if we 


e explicitly form the real matrix M = H? — sH + tI, 
e compute the real QR factorization M = ZR, and 
e set Ho = Zt HZ. 


But since the first of these steps requires O(n?) flops, this is not a practical 
course of action. 


7.5.5 The Double Implicit Shift Strategy 


Fortunately, it turns out that we can implement the double shift step with 
O(n?) flops by appealing to the Implicit Q Theorem of $7.4.5. In particular 
we can effect the transition from H to H2 in O(n?) flops if we 


e compute Me, the first column of M; 


e determine a Householder matrix Po such that P5(Mej) is a multiple 
of €1; 


e compute Householder matrices P,,...,P,—2 such that if Z, is the 
product Z; = P3P,-.- P, 3, then Zf HZ, is upper Hessenberg and 
the first columns of Z and Z1 are the same. 


Under these circumstances, the Implicit Q theorem permits us to conclude 
that if ZT HZ and Zi HZ, are both unreduced upper Hessenberg matrices, 
then they are essentially equal. Note that if these Hessenberg matrices are 
not unreduced, then we can effect a decoupling and proceed with smaller 
unreduced subproblems. 

Let us work out the details. Observe first that Py can be determined in 
O(1) flops since Me, = [z, y, z, 0,...,0]" where 


z = hå, + higha, shut 
y = ho (hi + hag — 8) 
z = haha. 
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Since a similarity transformation with P) only changes rows and columns 
1, 2, and 3, we see that 


X X x X X X 
X X X X X X 
X X X x X X 
PHP = X x X X X X 
0 00 x x X 
0 0 0 0 x x 
Now the mission of the Householder matrices P4,..., P,—2 is to restore this 


matrix to upper Hessenberg form. The calculation proceeds as follows: 


X X X X X X X X X X X X 
X X X X X X X X X X X X 
X X X X X X P, 0 x x X X X Pz 
—> —> 
X X X X X X 0 x X X X Xx 
0 0 0 x x x 0 x X X X Xx 
0 0 0 0 x x 0 0 0 0 x x 
X X X X X x X X X X X X 
X X X X x x X X X X X X 
0 X X X X X Py 0 x X X X X P4 
—— —À 
0 0 x x x x 0 0 x x x x 
0 0 x x x x 0 0 0 x x x 
0 0 x x x x 0 0 0 x x x 
X X X X X X 
X X X X X X 
0 x x X x x 
0 0 xX x x x 
0 0 0 x x x 
0 0 0 0 x x 


Clearly, the general P, has the form Py = diag(Ix, Pr, In-k-3) where P, is 
a 3-by-3 Householder matrix. For example, 


100000 
010 0 0 0 
0 0 x x x OQ 
Po=lo0x x x 0 
0 0 x x x O0 
000 00 1 


Note that P,,-2 is an exception to this since P,2= diag(In-2, Dj. 3). 
The applicability of Theorem 7.4.3 (the Implicit Q theorem) follows 
from the observation that Pie, = e, for k = lin — 2 and that Po and Z 
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have the same first column. Hence, Zie; = Ze,, and we can assert that Z: 
essentially equals Z provided that the upper Hessenberg matrices ZT HZ 
and Z7 HZ, are each unreduced. 

The implicit determination of H2 from H outlined above was first de- 
scribed by Francis (1961) and we refer to it as a Francis QR step. The 
complete Francis step is summarized as follows: 


Algorithm 7.5.1 (Francis QR Step) Given the unreduced upper Hes- 
senberg matrix H € IR"*" whose trailing 2-by-2 principal submatrix has 
eigenvalues a; and a», this algorithm overwrites H with Z^ HZ, where Z = 
P, -+- Pn_2 is a product of Householder matrices and Z7(H —a,I)(H —a;I) 
is upper triangular. 


m=n-—l1 
{Compute first column of (H — a,J)(H — aj1).) 
s= H(m,m)+ H(n,n) 
t= H(m,m)H(n,n) — H(m,n)H (n, m) 
z= H(1,1)H(1,1) + H(1,2)H(2,1) — sH(1,1) +t 
y — H(2, 1)(A(1, 1)+ H(2,2) - 8) 
z = H(2,1)H(3,2) 
for k = 0:n —3 
lv, 8) = house( [x y z]7) 
q = max{1, k}. 
H(k + l:k +3, g:n) = (I — BvvT)H (k + lk + 3, g:n) 
r = min{k + 4, n} 
H(1:r,k + 1:k +3) = H(l:r, k + 1:k +3)(I — Buv?) 
z— H(k--2,k--1) 
y = H(k-- 3,k +1) 
ifk «n—-3 
z — H(k - A, k - 1) 
end 
end 
Iv, | = heuse([z y]") 
H(n — 1:n,n — 2:n) = (I — Bue?) H (n — 1:n, n — 2:n) 
H (10:n,n — lin) = H(1:n,n — l:n)(1 — Buv") 


This algorithm requires 10n? flops. If Z is accumulated into a given or- 
thogonal matrix, an additional 10n? flops are necessary. 


7.5.6 The Overall Process 


Reducing A to Hessenberg form using Algorithm 7.4.2 and then iterating 
with Algorithm 7.5.1 to produce the real Schur form is the standard means 
by which the dense unsymmetric eigenproblem is solved. During the iter- 
ation it is necessary to monitor the subdiagonal elements in H in order to 


7.5. THE PRACTICAL QR ALGORITHM 359 


spot any possible decoupling. How this is done is illustrated in the following 
algorithm: 


Algorithm 7.5.2 (QR Algorithm) Given A € IR"*^ and a tolerance 
tol greater than the unit roundoff, this algorithm computes the real Schur 
canonical form QT AQ — T. A isoverwritten with the Hessenberg decompo- 
sition. If Q and T are desired, then T is stored in H. If only the eigenvalues 
are desired, then diagonal blocks in T' are stored in the corresponding po- 
sitions in H. 


Use Algorithm 7.4.2 to compute the Hessenberg reduction 
H = Ud AU, where Ug P, +++ P4 3. 
If Q is desired form Q = P --- Pa-2. See85.1.6. 
until q — n 
Set to zero all subdiagonal elements that satisfy: 
[hiia] S tol ha] + [hiiicil). 
Find the largest non-negative q and the smallest 
non-negative p such that 


Hy Hi? His p 
H = 0 H22 H3 n—-p-q 
0 0 Hyg 


p n-p-gq q 


where H33 is upper quasi-triangular and H332 is 
unreduced. (Note: either p or q may be zero.) 
ifq«n 
Perform a Francis QR step on Hæ: H2; = ZT H35Z 
if Q is desired 
Q = Qdiag(1,, Z, Iq) 
Hi» = HZ 
Hz = ZT Hz 
end 
end 
end 
Upper triangularize all 2-by-2 diagonal blocks in H that have 
real eigenvalues and accumulate the transformations 
if necessary. 


This algorithm requires 25n? flops if Q and T are computed. If only the 
eigenvalues are desired, then 10n? flops are necessary. These flops counts 
are very approximate and are based on the empirical observation that on 
average only two Francis iterations are required before the lower 1-by-1 or 
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2-by-2 decouples. 


Example 7.5.3 If Algorithm 7.5.2 is applied to 


23 4 5 6 
445 6 7 

Å = 0 3 6 7 B ; 
0 02 8 9 
0 0 Q 1 10 


then the subdiagonal entries converge as follows 


Iteration — O(lhzil) O(|ka2l) O(lkaz|)  OflAsal) 


1 109 10° 10° 107 

2 10° 10° 10° 10° 

3 10° 10° 10—! 10° 

4 10° 10° 10-3 1073 
5 10° 10° 10-5 10-5 
6 1071 109 10713 10-713 
7 1071 109 10-28 10713 
8 10-4 10° converg. converg. 
9 10-8 109 

10 10-8 109 

11 10-16 10° 

12 10-32 10° 

13 converg. converg. 


The roundoff properties of the QR algorithm are what one would expect 
of any orthogonal matrix technique. The computed real Schur form T is 
orthogonally similar to a matrix near to A, i.e., 


Q'(A-E)Q = T 


where QTQ = I and | E 2 ~ ^: u|| A |. The computed Q is almost orthog- 
onal in the sense that ÔTQ = I + F where || F ||; = u. 

The order of the eigenvalues along Î is somewhat arbitrary. But as we 
discuss in $7.6, any ordering can be achieved by using a simple procedure 
for swapping two adjacent diagonal entries. 


7.5.7 Balancing 


Finally, we mention that if the elements of A have widely varying magni- 
tudes, then A should be balanced before applying the QR algorithm. This 
is an O(n?) calculation in which a diagonal matrix D is computed so that 
if 
ry 
DAD = [q,...,e,] = 

rz 


then || r; ||oo © [| c; lloo for i = 1:n. The diagonal matrix D is chosen to have 
the form D = diag(8*,...,8'7) where £ is the floating point base. Note 
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that D^! AD can be calculated without roundoff. When A is balanced, the 


computed eigenvalues are often more accurate. See Parlett and Reinsch 
(1969). 


Problems 


P7.5.1 Show that if H = QT HQ is obtained by performing a single-shift QR step with 


Hz | E : | , then |hzi| € [y?z[/[(w — z)? + y?]. 
P7.5.2 Give a formula for the 2-by-2 diagonal matrix D that minimizes || D^! AD || p 


where A = | UTE | 
y z 


P7.5.3 Explain how the single-shift QR step H — uJ = UR, H = RU + pI can be 
carried out implicitly. That is, show how the transition from H to H can be carried out 
without subtracting the shift u from the diagonal of H. 


P7.5.4 Suppose H is upper Hessenberg and that we compute the factorization PH = 
LU via Gaussian elimination with partial pivoting. (See Algorithm 4.3.4.) Show that 
Hı = U(PTL) is upper Hessenberg and similar to H. (This is the basis of the modified 
LR algorithm.) 

P7.5.5 Show that if H = Ho is given and we generate the matrices Hj via Hy — pki 
= Uy Rk, Hy41 = R&UX + uki, then 


(Ui UjK(Rj Ri) = (H -m I) (H 7 uL) 


Notes and References for Sec. 7.5 


The development of the practical QR algorithm began with the important paper 


H. Rutishauser (1958). "Solution of Eigenvalue Problems with the LR. Transformation," 
Nat. Bur. Stand. App. Math. Ser. 49, 47-81. 


The algorithm described here was then “orthogonalized” in 


J.G.F. Francis (1961). “The QR Transformation: A Unitary Analogue to the LR Trans- 
formation, Parts I and Il” Comp. J. 4, 265-72, 332-45. 


Descriptions of the practical QR algorithm may be found in Wilkinson (1965) and Stew- 
art (1973), and Watkins (1991). See also 


D. Watkins and L. Elsner (1991). “Chasing Algorithms for the Eigenvalue Problem,” 
SIAM J. Matriz Anal. Appl. 12, 374-384. 

D.S. Watkins and L. Elsner (1991). “Convergence of Algorithms of Decomposition Type 
for the Eigenvalue Problem,” Lin. Alg. and Its Application 143, 19—47. 

J. Erxiong (1992). “A Note on the Double-Shift QL Algorithm,” Lin.Alg. and Its 
Application 171, 121-132. 


Algol procedures for LR and QR methods are given in 


R.S. Martin and J.H. Wilkinson (1968). “The Modified LR Algorithm for Complex Hes- 
senberg Matrices," Numer. Math. 12, 369-76. See also Wilkinson and Reinsch(1971, 
pp. 396-403). 
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R.S. Martin, G. Peters, and J.H. Wilkinson (1970). “The QR Algorithm for Real Hes- 
senberg Matrices," Numer. Math. 14, 219-31. See also Wilkinson and Reinsch(1971, 
pp. 359-71). 


Aspects of the balancing problem are discussed in 
E.E. Osborne (1960). *On Preconditioning of Matrices," JACM 7, 338-45. 
B.N. Parlett and C. Reinsch (1969). "Balancing a Matrix for Calculation of Eigen- 


values and Eigenvectors,” Numer. Math. 13, 292-304. See also Wilkinson and 
Reinsch(1971, pp. 315-26). 


High performance eigenvalue solver papers include 

Z. Bai and J.W. Demmel (1989). “On a Block Implementation of Hessenberg Multishift 
QR Iteration,” Int'l J. of High Speed Comput. 1, 97-112. 

G. Shroff (1991). “A Parallel Algorithm for the Eigenvalues and Eigenvectors of a 
General Complex Matrix,” Numer. Math. 58, 779-806. 

R.A. Van De Geijn (1993). “Deferred Shifting Schemes for Parallel QR Methods,” SIAM 
J. Matriz Anal. Appl. 14, 180-194. 


A.A. Dubrulle and G.H. Golub (1994). “A Multishift QR Iteration Without Computa- 
tion of the Shifts,” Numerical Algorithms 7, 173-181. 


7.6 Invariant Subspace Computations 


Several important invariant subspace problems can be solved once the real 
Schur decomposition Q7 AQ = T' has been computed. In this section we 
discuss how to 


e compute the eigenvectors associated with some subset of (A), 

e compute an orthonormal basis for a given invariant subspace, 

e block-diagonalize A using well-conditioned similarity transformations, 

e compute a basis of eigenvectors regardless of their condition, and 

e compute an approximate Jordan canonical form of A. 
Eigenvector/invariant subspace computation for sparse matrices is discussed 


elsewhere. See 87.3 as well as portions of Chapters 8 and 9. 


7.6.1 Selected Eigenvectors via Inverse Iteration 


Let g®) € €" be a given unit 2-norm vector and assume that A — uI € IR?*" 
is nonsingular. The following is referred to as inverse iteration: 


for k= 1,2,... 
Solve (A = pl)zO) = gk) 
q(9 = 200 || 209 |] (7.6.1) 


AG) = glk)? Aq) 
end 
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Inverse iteration is just the power method applied to (A — pl =; 
To analyze the behavior of (7.6.1), assume that A has a basis of eigen- 
vectors (z1,..., £n} and that Az; = Az, for i = Lin. If 


qU = Y Bie: 
1-1 


then g‘*) is a unit vector in the direction of 
A — uI)-*q9 = ri. 


Clearly, if 4 is much closer to an eigenvalue A; than to the other eigenvalues, 
then q“*) is rich in the direction of x; provided 5; Æ 0. 

À sample stopping criterion for (7.6.1) might be to quit as soon as the 
residual 


LUC) = (A — pl)q™) 
satisfies 


Ir llo < cul A lloc (7.6.2) 


where c is a constant of order unity. Since 
(A+ Ej)g( = pg 


with Ey = —r)g)" it follows that (7.6.2) forces p and q% to be an 
exact eigenpair for a nearby matrix. 

Inverse iteration can be used in conjunction with the QR algorithm as 
follows: 


e Compute the Hessenberg decomposition Uf AUo = H. 


e Apply the double implicit shift Francis iteration to H without accu- 
mulating transformations. 


e For each computed eigenvalue A whose corresponding eigenvector x 
is sought, apply (7.6.1) with A = H and p = A to produce a vector z 
such that Hz = uz. 


e Set x = Ugz. 


Inverse iteration with H is very economical because (1) we do not have to 
accumulate transformations during the double Francis iteration; (2) we can 
factor matrices of the form H — AI in O(n?) flops, and (3) only one iteration 
is typically required to produce an adequate approximate eigenvector. 
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This last point is perhaps the most interesting aspect of inverse iteration 
and requires some justification since À can be comparatively inaccurate if 
it is ill-conditioned. Assume for simplicity that A is real and let 


7" 
H-M = Mou = UXVT 
i=l 


be the SVD of H — XJ. From what we said about the roundoff properties 
of the QR algorithm in $7.5.6, there exists a matrix E € IR°*”” such that 
H + E — Al is singular and || E ||3 = ul] H ||. It follows that on 7: uc; and 
| (H — AZ)u, ||a 2s uei, Le., v, is à good approximate eigenvector. Clearly 
if the starting vector qÜ? has the expansion 


q” = $ yu 
then 


s di. 
y^ = — V. 
bib Qi ^ 


is “rich” in the direction v,. Note that if s(A) ~ |uZu,| is small, then 
z) is rather deficient in the direction un. This explains (heuristically) 
why another step of inverse iteration is not likely to produce an improved 
eigenvector approximate, especially if A is ill-conditioned. For more details, 
see Peters and Wilkinson (1979). 


Example 7.6.1 The matrix 


1 1 
A = | ioio d 


has eigenvalues A; = .99999 and Az = 1.00001 and corresponding eigenvectors 7] = 
[1, ~10~5]7 and z2 = [1, 1075]T. The condition of both eigenvalues is of order 105. 
The approximate eigenvalue p = 1 is an exact eigenvalue of A+ E where 


0 0 
E =| qoo E 


Thus, the quality of p is typical of the quality of an eigenvalue produced by the QR 
algorithm when executed in 10-digit floating point. 

If (7.6.1) is applied with starting vector g) = (0, 1]T, then g@)= [1,0]7 and 
|| Ag? — ug Q? ||a = 10719. However, one more step produces q?) = [0, 1]* for which 
|| Ag’? — pq?) ||? = 1. This example is discussed in Petera and Wilkinson (1979). 
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7.6.2 Ordering Eigenvalues in the Real Schur Form 


Recall that the real Schur decomposition provides information about in- 
variant subspaces. If 


TAQ- Ts | Tu Ti | P 
SCAS E GE: 


p q 


and X(T11)  A(T22) = 0, then the first p columns of Q span the unique 
invariant subspace associated with A(T3,). (See §7.1.4.) Unfortunately, the 
Francis iteration supplies us with a real Schur decomposition QT AQ F=Tp 
in which the eigenvalues appear somewhat randomly along the diagonal of 
Tr. This poses a problem if we want an orthonormal basis for an invariant 
subspace whose associated eigenvalues are not at the top of Tp’s diago- 
nal. Clearly, we need a method for computing an orthogonal matrix Qp 
such that QD TEQp is upper quasi-triangular with appropriate eigenvalue 
ordering. 

A look at the 2-by-2 case suggests how this can be accomplished. Sup- 
pose 


and that we wish to reverse the order of the eigenvalues. Note that Tpz — 


Asz where 
- i12 
2 | » | 


Let Qp be a Givens rotation such that the second component of Q% z is 
zero. If Q = QrQp then 


(QTAQ)e, = QDTr (Qpe) = AQb(Qpei) = A261 


and so QT AQ must have the form 
T [22 ti 


By systematically interchanging adjacent pairs of eigenvalues using this 
technique, we can move any subset of A(.ÀA) to the top of 7"s diagonal as- 
suming that no 2-by-2 bumps are encountered along the way. 


Algorithm 7.6.1 Given an orthogonal matrix Q € IR?*", an upper tri- 
angular matrix T = QT AQ, and a subset A = (A,,..., Àp} of A(A), the 
following algorithm computes an orthogonal matrix Qp such that QT Qp 
= S is upper triangular and (s;1,..., Spp} = A. The matrices Q and T are 
overwritten by QQp and S respectively. 
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while Itu, E tpp} "2 A 
for k = :n- 1 
if tkk ¢ A and tk+1,k+1 E Â 


[c, 5] 2 givens(T(k, k 4- 1), T(k -- 1, k +1) —T(k,k)) 


T 
T(k:k + 1, kin) = | Ki : | T (k:k + 1,k:n) 


T(1:k 1l, k:k +1) = T(1l:k 4 1, k:k +1) | = | 


Qn, kik +1) = Qon kik 1) | = ý | 
end 
end 


end 


This algorithm requires k(12n) flops, where k is the total number of required 
swaps. The integer k is never greater than (n — p)p. 

The swapping gets a little more complicated when T has 2-by-2 blocks 
along its diagonal. See Ruhe (1970) and Stewart (1976) for details. Of 
course, these interchanging techniques can be used to sort the eigenvalues, 
say from maximum to minimum modulus. 

Computing invariant subspaces by manipulating the real Schur decom- 
position is extremely stable. Tf Q =| ĝi,- d, ] denotes the computed or- 
thogonal matrix Q, then || QTQ-I |o ~ u and there exists a matrix E 
satisfying || E ||; zz ull A || such that (A+ E)j; € span{ĝı,...,ĝp} for 
i = lip. 


7.6.3 Block Diagonalization 


Let 
Tu Tiz +! Tig ni 
0 Yoo > Taq no 
oe e (7.6.3) 
D: - B oes Tog | Cf 
ni w Nng 


be a partitioning of some real Schur canonical form QT AQ = T c R"™” 
such that A(Ti1),...,A(Toq) are disjoint. By Theorem 7.1.6 there exists a 
matrix Y such that Y "TY = diag(Ti1,...,T7yq). A practical procedure 
for determining Y is now given together with an analysis of Y’s sensitivity 
as a function of the above partitioning. 

Partition I, = [ Ej,..., E; |] conformably with T and define the matrix 
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Y;; € IR"*” as follows: 
Yu = I, TEZGEL i< j, Zi; € IR" X^"; 


In other words, Y;; looks just like the identity except that Zi; occupies the 
(2,7) block position. It follows that if Y;; ; TY = = T = (T,;) then T and T 
are identical except that 


T = T lü£ij T Zij14 + li 
Tik = = Tik — ZijTjk (k LJ 1:q) 
TL; = Tri Zig + Tkj (k = 143-1) 


Thus, 7i; can be zeroed provided we have an algorithm for solving the 
Sylvester equation 
FZ- ZG = C (7.6.4) 


where F € IR?*? and GeR*" are given upper quasi-triangular matrices 
and C € IRP**, 

Bartels and Stewart (1972) have devised a method for doing this. Let C 
= [er...c-] and Z = [zj,...,2. ] be column partitionings. If gx.41,& = 0, 
then by comparing columns in (7.6.4) we find 


k 
Fz = uz — Ck. 


i=l 


Thus, once we know z1, ..., zķ-1 then we can solve the quasi-triangular 
system 


(F — geet) zk = Ck +9 gikži 


for zy. If gk+1,k # 0, then zy and zķ41 can be simultaneously found by 
solving the 2p-by-2p system 


F — gkkI Omkt z c = Qik? 
— Gkk —9mk k k ik 2i 

= + 7.6.5 
—gkml F — 9mnl | | zm | | Cm | 2, | 9im ^i | ( ) 
where m = k+1. By reordering the equations according to the permutation 
(1,p+1,2,p+2,...,p,2p), a banded system is obtained that can be solved 
in O(p?) flops. The details may be found in Bartels and Stewart (1972). 
Here is the overall process for the case when F and G are each triangular. 


Algorithm 7.6.2 (Bartels-Stewart Algorithm) Given C € IRP** and 
upper triangular matrices F € IRP*? and Ge R™" that satisfy A(F) A 
A(G) = @, the following algorithm overwrites C with the solution to the 
equation FZ — ZG = C. 
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for k = l:r 
C(l:p,k) = C(1:p, k) + C(1:p, 1:k — 1)G(1:k — 1, Kk) 
Solve (F — G(k, k)z = C(1:p, k) for z. 
C(1:p,k) =z 

end 


This algorithm requires pr(p + r) flops. 
By zeroing the super diagonal blocks in T in the appropriate order, the 
entire matrix can be reduced to block diagonal form. 


Algorithm 7.6.3 Given an orthogonal matrix Q € IR"*”, an upper quasi- 
triangular matrix T = QT AQ, and the partitioning (7.6.3), the following 
algorithm overwrites Q with QY where Y -! TY = diag(Tii,. .., 134). 


for 7 = 2g 
fori-lj-1 
Solve Ti;Z — ZT;; = —Tij for Z using Algorithm 7.6.2. 
for k = 7+ 1:q 
Tik = Tik — ZT jr 
end 
for k = l:q 
Qx; = QkiZ + Qk; 
end 
end 
end 


The number of flops required by this algorithm is a complicated function 
of the block sizes in (7.6.3). 

The choice of the real Schur form T and its partitioning in (7.6.3) de- 
termines the sensitivity of the Sylvester equations that must be solved in 
Algorithm 7.6.3. This in turn affects the condition of the matrix Y and 
the overall usefulness of the block diagonalization. The reason for these 
dependencies is that the relative error of the computed solution Z to 


TaZ — ZT; = -Tij (7.6.6) 
satisfies M 
IZ-Zle | Te 
I 2 Ilr sep(Ti;; 155) 
For details, see Golub, Nash, and Van Loan (1979). Since 


T,X — XT; . 
x #0 |X Ile pa 
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there can be a substantial loss of accuracy whenever the subsets A(7;,) are 
insufficiently separated. Moreover, if Z satisfies (7.6.6) then 


T;; 
zi s ee. 
sep(1i;, 75) 
Thus, large-norm solutions can be expected if sep(Tj;,T;;) is small. This 


tends to make the matrix Y in Algorithm 7.6.3 ill-conditioned since it is 
the product of the matrices 


Note: xp (Yy) = 2n- || Z|% 

Confronted with these difficulties, Bavely and Stewart (1979) develop 
an algorithm for block diagonalizing that dynamically determines the eigen- 
value ordering and partitioning in (7.6.3) so that all the Z matrices in Al- 
gorithm 7.6.3 are bounded in norm by some user-supplied tolerance. They 
find that the condition of Y can be controlled by controlling the condition 
of the Y;;. 


7.6.4 | Eigenvector Bases 


If the blocks in the partitioning (7.6.3) are all 1-by-1, then Algorithm 7.6.3 
produces a basis of eigenvectors. As with the method of inverse iteration, 
the computed eigenvalue-eigenvector pairs are exact for some “nearby” ma- 
trix. A widely followed rule of thumb for deciding upon a suitable eigen- 
vector method is to use inverse iteration whenever fewer than 25% of the 
eigenvectors are desired. 

We point out, however, that the real Schur form can be used to deter- 
mine selected eigenvectors. Suppose 


Tu u Tia k—1 
QTAQ = O Aà x 1 
0 0 733 n-k 
k—1 1 n-k 


is upper quasi-triangular and that A ¢ A(T11) UA(Ts3). It follows that if we 
solve the linear systems (Ty, — AJ)w = —u and (T33 — AI rz = —v then 


w 0 
p= and y = Qj} 1 
0 Z 


are the associated right and left eigenvectors, respectively. Note that the 
condition of A is prescribed by 1/s(A) = y (1 + wT w)(1 + 27 z). 
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7.6.5 Ascertaining Jordan Block Structures 


Suppose that we have computed the real Schur decomposition A = QTQT, 
identified clusters of “equal” eigenvalues, and calculated the corresponding 
block diagonalization T = Y diag(Tni,...,T4,,)Y T}. As we have seen, this 
can be a formidable task. However, even greater numerical problems con- 
front us if we attempt to ascertain the Jordan block structure of each T;;. A 
brief examination of these difficulties will serve to highlight the limitations 
of the Jordan decomposition. 


Assume for clarity that A(T;,) is real. The reduction of T;; to Jordan 
form begins by replacing it with a matrix of the form C = AI + N, where 
N is the strictly upper triangular portion of 7;; and where A, say, is the 
mean of its eigenvalues. 


Recall that the dimension of a Jordan block J(A) is the smallest non- 
negative integer k for which [J(A) — AZ]* = 0. Thus, if p; = dim[null(.N*)], 
for i = O:n, then p; — p;-1 equals the number of blocks in C's Jordan 
form that have dimension i or greater. À concrete example helps to make 
this assertion clear and to illustrate the role of the SVD in Jordan form 
computations. 


Assume that C is 7-by-7. Suppose we compute the SVD UT NV, = X; 
and “discover” that N has rank 3. If we order the singular values from 
small to large then it follows that the matrix Nj = VI NV, has the form 


0 K] 4 
Nili 
4 3 


At this point, we know that the geometric multiplicity of A is 4—i.e, C's 
Jordan form has 4 blocks (pı — po = 4 — 0 = 4). 


Now suppose Uf LV, = Y: is the SVD of L and that we find that L has 
unit rank. If we again order the singular values from small to large, then 
La = Vj LV clearly has the following structure: 


0 0 
If = 10 0 
0 0 


A c. 8 


However A(L2) = A(L) = {0,0,0} and so c = 0. Thus, if 


Vo = diag(I4, V2) 
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then N2 = Vi Ni V2 has the following form: 


No = 


cooooc °c oo 
ooo oon coc 2 
coOoocoo ce eo 
coco e ec & 
OOo OX XK XK XK 
Oo co X XK XK 
COnrkh K X K x 


Besides allowing us to introduce more zeroes into the upper triangle, the 
SVD of L also enables us to deduce the dimension of the null space of N?. 


Since 
N2 = 0 KL} | 0 K 0 K 
1^ |o fy |0 L 0 L 


and | i | has full column rank, 


p2 = dim(null(N?)) = dim(null(N7)) = 4+ dim(null(D)) = pı + 2. 


Hence, we can conclude at this stage that the Jordan form of C has at least 
two blocks of dimension 2 or greater. 

Finally, it is easy to see that N? = 0, from which we conclude that there 
is pa — p2 = 7—6 = 1 block of dimension 3 or larger. If we define V = Vi V2 
then it follows that the decomposition 


A 0 00 x x x 
0A 00 x x x 
] 

ae or oe e 4 blocks of order 1 or larger 
V'cv2|0002A x x x 

0000 A x a } 2 blocks of order 2 or larger 

00000 A 0 

000000 4 ) 1 block of order 3 or larger 


“displays” C's Jordan block structure: 2 blocks of order 1, 1 block of order 
2, and 1 block of order 3. 

To compute the Jordan decomposition it is necessary to resort to non- 
orthogonal transformations. We refer the reader to either Golub and Wilkin- 
son (1976) or Kagstrém and Ruhe (1980a, 1980b) for how to proceed with 
this phase of the reduction. 

The above calculations with the SVD amply illustrate that difficult 
rank decisions must be made at each stage and that the final computed 
block structure depends critically on those decisions. Fortunately, the sta- 
ble Schur decomposition can almost always be used in lieu of the Jordan 
decomposition in practical applications. 
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Problems 


P7.6.1 Give a complete algorithm for solving a real, n-by-n, upper quasi-triangular 
system Tz = b. 


P7.6.2 Suppose U-! AU = diag(aj,..., am) and VT? BV = dieg(01,..., n). Show 
that if (X) = AX + XB, then M9) = {ai +8;:i = Lm, j = Ln }. What 
are the corresponding eigenvectors? How can these decompositions be used to solve 
AX -XB-C? 


PT.6.3 Show that if Y — | : : | then k2(Y) = [2+ o? + 4c? +07 ]/2 where 


a = || Z lla. 
P7.6.4 Derive the system (7.6.5). 
P7.6.5 Assume that T c R®*” is block upper triangular and partitioned as follows: 


Ti; Tio This 
T = 0 Tz Tz Tcmp* 
0 0 T33 


Suppose that the diagonal block T25 is 2-by-2 with complex eigenvalues that are disjoint 
from A(T31) and A(T33). Give an algorithm for computing the 2-dimensional real invari- 
ant subspace associated with T22’s eigenvalues. 


P7.6.6 Suppose H € R^ *" is upper Hessenberg with a complex eigenvalue A--i-u. How 
could inverse iteration be used to compute z, y € R” so that H(z+iy) = A-Fij)(z-- iy)? 
Hint: compare real and imaginary parts in this equation and obtain a 2n-by-2n real sys- 
tem. 


PT.6.6 (a) Prove that if jo € © has nonzero real part, then the iteration 


= ( +>) 
Bei = > | Hk+ — 
+ 2 Bk 


converges to 1 if Re(uo) > 0 and to -1 if Re(jto) < 0. (b) Suppose Ac C^*" is 
diagonalizable and that 
A -x| P px? 


0 D- 


where D, € CP”? and D- e (7? *(-7P are diagonal with eigenvalues in the open 
right half plane and open left half plane, respectively. Show that the iteration 


1 >i 
Anyi = 5 (Ae +A; ) Ag=A 
converges to 
sign( A) = X Q^ aspis : 
(c) Suppose 
0 M22 n—p 
p n-p 


with the property that A(M11) is in the open right half plane and A(M23) is in the open 
left half plane. Show that 


M = Gea | p 


| p zZ 
sign(M) = | D | 


and that —Z/2 solves M11 X — X M22 = —Mjq. Thus, 


I. —-Z/2 £ M 0 
U= p 1 Z 11 
| D. uos | —U "MU | 0 Maa | x 
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G. Peters and J.H. Wilkinson (1979). “Inverse Iteration, Ill-Conditioned Equations, and 
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J.J. Dongarra, S. Hammarling, and J.H. Wilkinson (1992). “Numerical Considerations 
in Computing Invariant Subspaces,” SIAM J. Matriz Anal. Appl. 13, 145-161. 
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C. Bavely and G.W. Stewart (1979). “An Algorithm for Computing Reducing Subspaces 
by Block Diagonalization," SIAM J. Num. Anal. 16, 359-67. 
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Computation of the Jordan Normal Form of a Complex Matrix," ACM Trans. Math. 
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Berkeley. 
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S.P. Chan and B.N. Parlett (1977). "Algorithm 517: A Program for Computing the 
Condition Numbers of Matrix Eigenvalues Without Computing Eigenvectors,” ACM 
Trans. Math. Soft. 3, 186-203. 
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Lin. Alg. and Its Applic. 88/89, 715-732. 

Z. Bai, J. Demmel, and A. McKenney (1993). “On Computing Condition Numbers for 
the Nonsymmetric Eigenproblem,” ACM Trans. Math. Soft. 19, 202-223. 


As we have seen, the sep(.,.) function is of great importance in the assessment of a com- 
puted invariant subspace. Aspects of this quantity and the associated Sylvester equation 
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J. Varah (1979). “On the Separation of Two Matrices,” SIAM J. Num. Anal 16, 
212-22. 

R. Byers (1984). “A Linpack-Style Condition Estimator for the Equation AX — X BT = 
C," IEEE Trans. Auto. Cont. AC-29, 926—928. 
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and Its Appl. 109, 91-105. 

N.J. Higham (1993). “Perturbation Theory and Backward Error for AX - XB = C,” 
BIT 33, 124-136. 

J. Gardiner, M.R. Wette, A.J. Laub, J.J. Amato, and C.B. Moler (1992). “Algorithm 
705: A FORTRAN-77 Software Package for Solving the Sylvester Matrix Equation 
AX BT +CXDT = E,” ACM Trans. Math. Soft. 18, 232-238. 


Numerous algorithms have been proposed for the Sylvester equation, but those described 
in 


R.H. Bartels and G.W. Stewart (1972). "Solution of the Equation AX + XB = C, 
Comm. ACM 15, 820-26. 

G.H. Golub, S. Nash, and C. Van Loan (1979). *A Hessenberg-Schur Method for the 
Matrix Problem AX + X B = C," IEEE Trans. Auto. Cont. AC-24, 909-13. 


are among the more reliable in that they rely on orthogonal transformations. A con- 
strained Sylvester equation problem is considerd in 


J.B. Barlow, M.M. Monahemi, and D.P. O'Leary (1992). *Constrained Matrix Sylvester 
Equations, SIAM J. Matriz Anal. Appl. 13, 1-9. 
The Lyapunov problem FX + XFT = —C where C is non-negative definite has a 
very important role to play in control theory. See 


S. Barnett and C. Storey (1968). “Some Applications of the Lyapunov Matrix Equation,” 
J. Inst. Math. Applic. 4, 33-42. 

G. Hewer and C. Kenney (1988). “The Sensitivity of the Stable Lyapunov Equation,” 
SIAM J. Control Optim 26, 321—344. 

A.R. Ghavimi and A.J. Laub (1995). “Residual Bounds for Discrete-Time Lyapunov 
Equations,” IEEE Trans. Auto. Cont. 40, 1244-1249. 


Several authors have considered generalizations of the Sylvester equation, i.e., EF; XG; = 
C. These include 


P. Lancaster (1970). “Explicit Solution of Linear Matrix Equations,” SIAM Review 12, 
544—066. 

H. Wimmer and A.D. Ziebur (1972). "Solving the Matrix Equations Dfp(A)gp(A) = C," 
SIAM Review 14, 318-23. 

W.J. Vetter (1975). "Vector Structures and Solutions of Linear Matrix Equations," Lin. 
Alg. and Its Applic. 10, 181-88. 


Some Ideas about improving computed eigenvalues, eigenvectors, and invariant sub- 
spaces may be found in 
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J.J. Dongarra, C.B. Moler, and J.H. Wilkinson (1983). “Improving the Accuracy of 
Computed Eigenvalues and Eigenvectors,” SIAM J. Numer. Anal. 20, 23-46. 


J.W. Demmel (1987). “Three Methods for Refining Estimates of Invariant Subspaces,” 
Computing 38, 43-57. 


Hessenberg/QR iteration techniques are fast, but not very amenable to parallel computa- 
tion. Because of this there is a hunger for radically new approaches to the eigenproblem. 
Here are some papers that focus on the matrix sign function and related ideas that have 
high performance potential: 


C.S. Kenney and A.J. Laub (1991). “Rational Iterative Methods for the Matrix Sign 
Function,” SIAM J. Matrix Anal. Appl. 12, 273-291. 


C.S. Kenney, A.J. Laub, and P.M. Papadopouos (1992). “Matrix Sign Algorithms for 
Riccati Equations,” IMA J. of Math. Control Inform. 9, 331-344. 


C.S. Kenney and A.J. Laub (1992). “On Scaling Newton’s Method for Polar Decompo- 
sition and the Matrix Sign Function,” SIAM J. Matriz Anal. Appl. 13, 688—706. 


N.J. Higham (1994). “The Matrix Sign Decomposition and Its Relation to the Polar 
Decomposition,” Lin. Alg. and Its Applic 212/213, 3-20. 


L. Adams and P. Arbenz (1994). “Towards a Divide and Conquer Algorithm for the Real 
Nonsymmetric Eigenvalue Problem,” SJAM J. Matriz Anal. Appl. 15, 1333-1353. 


7.7 The QZ Method for Ax = ABx 


Let A and B be two n-by-n matrices. The set of all matrices of the form 
A — AB with A € C is said to be a pencil. The eigenvalues of the pencil 
are elements of the set A(A, B) defined by 


A(A,B) = {z E C:det(A —zB) 20). 


If A € A(A, B) and 
Ar = ABr  r£0 (7.7.1) 


then z is referred to as an eigenvector of A — AB. 

In this section we briefly survey some of the mathematical properties 
of the generalized eigenproblem (7.7.1) and present a stable method for its 
solution. The important case when A and B are symmetric with the latter 
positive definite is discussed in 88.7.2. 


T.7.1 Background 


The first thing to observe about the generalized eigenvalue problem is that 
there are n eigenvalues if and only if rank(B) — n. 1f B is rank deficient 
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then A(A, B) may be finite, empty, or infinite: 


ipu] od 


A F H Hs P 3 5. DEB 


Qo om 


x | = AX(AB)- (1) 


A 


! 
a 
© =e 
© t3 
—— 
by 
| 


|o 3 = XAB)-€ 


Note that if 0 Z A € A(A, B) then (1/2) € A(B, A). Moreover, if B is 
nonsingular then A(A, B) = A(B7'A,J) = A(B^14A). 

This last observation suggests one method for solving the A — AB prob- 
lem when B is nonsingular: 

e Solve BC — A for C using (say) Gaussian elimination with pivoting. 


e Use the QR algorithm to compute the eigenvalues of C. 


Note that C will be affected by roundoff errors of order ull A |l2|| B^ |l2. 
If B is ill-conditioned, then this can rule out the possibility of computing 
any generalized eigenvalue accurately—even those eigenvalues that may be 
regarded as well-conditioned. 


Example 7.7.1 If 


1.746 .940 _ [780 .563 
xa | 1.246 1.898 | M qm | 913 .659 | 


then A(A, B) = {2,1.07x 10°}. With 7-digit floating point arithmetic, we find A( fL(AB- 1)) 
= (1.562539, 1.01 x 10°}. The poor quality of the small eigenvalue is because &2(B) = 
2 x 108. On the other hand, we find that 

AU, FKA! B)) = (2.000001, 1.06 x 108]. 
The accuracy of the small eigenvalue is improved because «2(A) % 4. 
Example 7.7.1 suggests that we seek an alternative approach to the A-— AB 


problem. One idea is to compute well-conditioned Q and Z such that the 
matrices 


A, =Q'AZ B,=Q7'BZ (7.7.2) 
are each in canonical form. Note that A(A, B)= A(A;, Bı) since 


Az = ABr & Aw = ABy x= Zy 


We say that the pencils A — AB and A, — AB, are equivalent if (7.7.2) 
holds with nonsingular Q and Z. 
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7.7.2 The Generalized Schur Decomposition 


As in the standard eigenproblem A — XJ there is a choice between canonical 
forms. Analogous to the Jordan form is a decomposition of Kronecker in 
which both A; and B, are block diagonal. The blocks are similar to Jordan 
blocks. The Kronecker canonical form poses the same numerical difficulties 
as the Jordan form. However, this decomposition does provide insight into 
the mathematical properties of the pencil A — AB. See Wilkinson (1978) 
and Demmel and Káàgstróm (1987) for details. 

More attractive from the numerical point of view is the following de- 
composition described in Moler and Stewart (1973). 


Theorem 7.7.1 (Generalized Schur Decomposition) Jf A and B are 
in ©"*", then there exist unitary Q and Z such that QU AZ = T and 
QË BZ = S are upper triangular. If for some k, tkk and sky are both zero, 
then (A,B) = C. Otherwise 


AA, B) = (tá/si:sá 0). 


Proof. Let (B ky be a sequence of nonsingular matrices that converge to B. 
For each k, let QU (AB, DQ; = = Ry bea S decomposition of AB. Let 
Zi be unitary such that ZË (By Qi) = Sy ! is upper triangular. It follows 
that both QH AZ, = R&S, and QË BLZ, = S; are also upper triangular. 
Using the Bolzano- Weierstrass theorem, we know that the bounded se- 
quence ((Qx, Zk)} has a converging subsequence, lim(Qx,, Zk:) = (Q, Z). 
It is easy to show that Q and Z are unitary and that QI AZ and Q” BZ 
are upper triangular. The assertions about (A, B) follow from the identity 


"n 
det(A — AB) = det(QZ*) |] (tu — Asi). 1 
i=l 


If A and B are real then the following decomposition, which corresponds 
to the real schur decomposition (Theorem 7.4.1), is of interest. 


Theorem 7.7.2 (Generalized Real Schur Decomposition) If A and 
B are in IR?*^ then there exist orthogonal matrices Q and Z such that 
QT AZ is upper quasi-triangular and QT BZ is upper triangular. 


Proof. See Stewart (1972). L1 
In the remainder of this section we are concerned with the computation of 
this decomposition and the mathematical insight that it provides. 


7.7.3 Sensitivity Issues 


The generalized Schur decomposition sheds light on the issue of eigenvalue 
sensitivity for the A — AB problem. Clearly, small changes in A and B can 
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induce large changes in the eigenvalue A; = t,;/sq if sj; is small. However, 
as Stewart (1978) argues, it may not be appropriate to regard such an 
eigenvalue as “ill-conditioned.” The reason is that the reciprocal p; = 
si /t;;, might be a very well behaved eigenvalue for the pencil nA — B. In 
the Stewart analysis, A and B are treated symmetrically and the eigenvalues 
are regarded more as ordered pairs (t,;, 5;;) than as quotients. With this 
point of view it becomes appropriate to measure eigenvalue perturbations 
in the chordal metric chord (a, b) defined by 


la — bl 
V1 + a?V/1 + 6? 
Stewart shows that if A is a distinct eigenvalue of A — AB and A¢ is the 


corresponding eigenvalue of the perturbed pencil À — AB with | A — À [|a e 
| B — B |a = e, then 


chord(a,b) = 


€ 


"ENEE ee 
chord(A, Ae) < (yF Az)? + (yF Bz)? 


+ Ole) 


where z and y have unit 2-norm and satisfy Ar = ABz and yF = Ay B. 
Note that the denominator in the upper bound is symmetric in A and B. 
The "truly" ill-conditioned eigenvalues are those for which this denominator 
is small. 


The extreme case when tkk = syy = 0 for some k has been studied 
by Wilkinson (1979). He makes the interesting observation that when this © 
occurs, the remaining quotients t,;/9;; can assume arbitrary values. 


7.7.4 Hessenberg-Triangular Form 


The first step in computing the generalized Schur decomposition of the pair 
(A, B) is to reduce A to upper Hessenberg form and B to upper triangular 
form via orthogonal transformations. We first determine an orthogonal U 
such that UT B is upper triangular. Of course, to preserve eigenvalues, we 
must also update A in exactly the same way. Let's trace what happens in 
the n — 5 case. 


A=UTA= ,B=UTB= 


x X X x x 
X X X x x 
X xXx xxx 
X X X X x 
X X x xx 
cocco ox 
ooo xX X 
oo x X xX 
OX xX KX 
x K Xxx 
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Next, we reduce A to upper Hessenberg form while preserving B’s upper 


triangular form. First, a Givens rotation Q4s is determined to zero as): 


A-QÍLA- ,B2QLB- 


OX X X X 
X X X X XK 
X X X X X 
X X X X X 
X X X X X 
ooo © x 
Ooo X xX 
OO X X X 
X X X X X 
X X X X XK 


The nonzero entry arising in the (5,4) position in B can be zeroed by 
postmultiplying with an appropriate Givens rotation Z5 : 


OX X X X 
X X X X X 
X X X X X 
X X X X X 
XX X xx 
oooo x 
CoO XK xX 
oO XK Xx 
OX Xxx 
X X X X X 


Zeros are similarly introduced into the (4, 1) and (3, 1) positions in A: 


X X X X X X X X X X 
X X X X X 0 x x x x 
A-QiA—- |x x x x x |,B=QUB=]0 0 x x x 
0 x X X xX 0 0 x x x 
0 x x x x 0 0 0 0 x 
X X X X Xx X X X X X 
X X X X X 0 x x x x 
A-—AÀZsa-—|1x x x x x [, B-BZ44-2|0 0 x x x 
0 x x x x 0 0 0 x x 
Ü x x x x 0 0 0 0 x 
X X X X Xx X X X X X 
X X X X X Ü x x X x 
AzQLA-|0 x x x x ,B-QLB- 0 x x x x 
0 x x x x 0 0 0 x x 
0 x x x x 0 00 0 x 
X X X X X X X X Xx x 
X X X X X 0 x x x x 
A-ÀAÀZa-—|0 x x x x ,B-BZz3—-| 0 0x x x 
0 x x x x 0 0 0 x x 
O0 x x x x 0 0 0 0 x 


A is now upper Hessenberg through its first column. "The reduction is 
completed by zeroing a52, @42, and asa. As is evident above, two orthogonal 
transformations are required for each a;; that is zeroed—one to do the 
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zeroing and the other to restore B’s triangularity. Either Givens rotations 
or 2-by-2 modified Householder transformations can be used. Overall we 
have: 


Algorithm 7.7.1 (Hessenberg-Triangular Reduction) Given A and 
B in IR"*", the following algorithm overwrites A with an upper Hessenberg 
matrix QT AZ and B with an upper triangular matrix QT BZ where both 
Q and Z are orthogonal. 


Using Algorithm 5.2.1, overwrite B with QT B — R where 
Q is orthogonal and R is upper triangular. 
A=QTA 
forj21:n—2 
for i =n: — 1:7} +2 
los] = givens(A(i — 1,3), A(i, j)) 
A(i — 1:1, jin) = | B. : | A(i — 1:i, j:n) 


8 € 


T 
B(i — kii — lin) = E: S | Bü-ldii-lm) 


[c, s] = givens(— B(i,1), B(i, i — 1)) 
B(1:i,i — 14) = B(lui— 1:4) | VR 


—8 C 
A(1:n,i — 1:3) = A(1:n, i — 1:2) | E : | 


end 
end 


This algorithm requires about 8n? flops. The accumulation of Q and Z 
requires about 4n? and 3n? flops, respectively. 

The reduction of A — AB to Hessenberg-triangular form serves as a 
“front end” decomposition for a generalized QR iteration known as the QZ 
iteration which we describe next. 
and orthogonal matrices Q and Z are defined by | 

| —.1231 -.9917 .0378 | | 1.0000 0.0000 0.0000 | 
Q = | —4924 .0279  —.8699 and Z = | 0.0000 —.8944  —.4472 


Example 7.7.3 If 


10 1 2 1 2 
A= 1 2 —1 and B = 4 5 
1 1 2 7 8 


om c) 


—.8616 .1257 .4917 0.0000 .4472 —.8944 
then Ai = Q7 AZ and Bı = QT BZ are given by 


-2.5849 1.5413 2.4221 —8.1240 3.6332 14.2024 
Ai = —9.7631 0874 1.9239 and By = 0.0000 0.0000 1.8739 |. 


0.0000 2.7233  —.7612 0.0000 0.0000 .7612 
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7.7.5 Deflation 


In describing the QZ iteration we may assume without loss of generality that 
A is an unreduced upper Hessenberg matrix and that B is a nonsingular 
upper triangular matrix. The first of these assertions is obvious, for if 
Ok lk = 0 then 


A XB. Au — ABu A12 — ABi2 | k 


0 A32 — AB23 n—k 
k n—k 


and we may proceed to solve the two smaller problems A; — \By, and 
A22 — À B22. On the other hand, if b,, = 0 for some k, then it is possible to 
introduce a zero in A’s (n,n — 1) position and thereby deflate. Illustrating 
by example, suppose n — 5 and k — 3: 


be 

il 
Ooo XK Xx 
OOXXxx 
OXXXX 
X x X X X 
x XX X X 

by 

I 
oooeo x 
DOC XK XK 
oo oO XK X 
ox X X xX 
x xX X X X 


The zero on B’s diagonal can be “pushed down” to the (5,5) position as 
follows using Givens rotations: 


x XxXxX X X X X X X 

X X X X X 0 x x x x 
A2QuA-|0 x x x x ,B-QZB-|0 0 0 x x 
0 x x x x 0000 x 

0 0 0 x x 0 0 0 0 x 

*X- 6. €. X X X X X X X 

X X X X X 0 x x x x 

A= AZo = 0 x x x x , B = BZ} = 0 0 0 x x 
0 0 x x x 0 0 0 0 x 

| 0 0 0 x x 0 0 0 0 x 

x X XX X X X X X Xx 

X X X X X 0 x x x x 
A=Qi,A=]0 x x x x L,B-QLB-|0 0 0 x x 
0 0 x x x 0 0 0 0 x 

0 0 x x x 0 000 0 

<x X x X X X X X X 

X X X X X 0 x xX Xx x 
A-AZa-|0 x x x x ,B-BZl,2|0 0 x x x 
0 0 x x x 0 0 0 0 x 

E 0 0 x ] 0 0 0 0 l 
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A= AZas = , B = BZ = 


OoOnrm & X xX 
Om XK XK XK 
OX X XK Xx 
OX X X Xx 
X X X X X 
Oro oc x 
Oooo XK XK 
Om xX XK x 
OX X X XK 
OX X X Xx 


This zero-chasing technique is perfectly general and can be used to zero 
Gn,n—1 regardless of where the zero appears along B's diagonal. 


7.7.6 The QZ Step 


We are now in a position to describe a QZ step. The basic idea is to update 
A and B as follows 


(A — AB) = Q'(A- AB)Z, 


where A is upper Hessenberg, B is upper triangular, Q and Z are each 
orthogonal, and AB- is essentially the same matrix that would result if a 
Francis QR step (Algorithm 7.5.2) were explicitly applied to AB". This 
can be done with some clever zero-chasing and an appeal to the implicit Q 
theorem. 

Let M = AB™! (upper Hessenberg) and let v be the first column of the 
matrix (M — aI)(M — bI), where a and b are the eigenvalues of M's lower 
2-by-2 submatrix. Note that v can be calculated in O(1) flops. If Po is a 
Householder matrix such that Pov is a multiple of e1, then 


Oooo XK XK XxX 
OOO XK XX 
Ooo KX K XK X 
OX X X X X 
X X X X X X 
X X X X X X 


B = PB 


li 
cOoOoOoxXXx 
oOo xX X X 
OOO X X X 
moO xX X X X 
OX X X X &X 
x X X X X &X 


The idea now is to restore these matrices to Hessenberg-triangular form by 
chasing the unwanted nonzero elements down the diagonal. 


To this end, we first determine a pair of Householder matrices Z; and 


7.7. THE QZ METHOD FOR AX = ABx 383 


Zz to zero b31, b32, and ba): 


X X X X X X 

X X X X X X 

A = AZZ = X X X X X X 
X x X X X X 

0 0 O0 x x x 

0 0 0 0 x x 

X X X X X X 

0 x x X Xx Xx 

0 0 x x x x 

BB -1g: 9 x xx 
0 0 0 Q0 x x 

0 0 0 0 0 x 


Then a Householder matrix P4 is used to zero a3; and a41: 


ooo co xX xX 
oo x xxx 
oOo x X XK x 
Ox X xXx x x 
X X X X X X 
X X X X x X 


oOo coo © xX 
OO xX xXx xx 
CoO xX XXX 
oo xX X Xx 
X X X KX x 
x X X X X X 


e 


Notice that with this step the unwanted nonzero elements have been shifted 
down and to the right from their original position. This illustrates a typical 
step in the QZ iteration. Notice that Q = QoQ --- Q4. 2 has the same first 
column as Qo. By the way the initial Householder matrix was determined, 
we can apply the implicit Q theorem and assert that AB~! = QT(AB-!)Q 
is indeed essentially the same matrix that we would obtain by applying the 
Francis iteration to M = AB-! directly. Overall we have: 


Algorithm 7.7.2 (The QZ Step) Given an unreduced upper Hessenberg 
matrix A € R”*” and a nonsingular upper triangular matrix B € IR?*", 
the following algorithm overwrites A with the upper Hessenberg matrix 
QT AZ and B with the upper triangular matrix QT BZ where Q and Z are 
orthogonal and Q has the same first column as the orthogonal similarity 
transformation in Algorithm 7.5.1 when it is applied to AB". 
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Let M = AB-! and compute (M — aI)(M — bI)e = (x,y, 2,0,...,0)7 
where a and b are the eigenvalues of M's lower 2-by-2. 
for k = lin — 2 
Find Householder Qg so Q.[rgyz]" = [*00]f. 
A= diag(Ik-1, Qk, In-k-2) A 
B = diag(Ik-1;,Qk, In~k-2)B 
Find Householder 2,1 so 
[ bryza Pkazkai Dkazka2 |] Ze = [0 0 * J. 
Ax Adiag(Iy-1, 241, In-&-2) 
B = Bdiag(Iy-1, Zky, In-x-2) 
Find Householder Z;2 so 
| bk+1,k bk+1,k+1 |Z =[0 *]. 
A = Adiag(Iy-1, Zkz, In-+-1) 
B = Bdiag(Iy..1, Zk2: In-k-1) 
T = Qk+1,k) Y = Oki 
if k «n—2 
Z = Gk43,k 
end 
end 


Find Householder Qn-1 so Qn-1 | i | = | : | 


y 
A = diag(In-2, Qn-1)A 
B= diag(In—2, Qn-1)B 
Find Householder Z,,1 so 
Dur bnn |Zn-1=[0 >» ] 
A = Adiag(In—2Zn-1) 
B= Bdiag(1, 5, Zai) 


This algorithm requires 22n? flops. Q and Z can be accumulated for an 
additional 8n? flops and 13n? flops, respectively. 


7.7.7 The Overall QZ Process 


By applying a sequence of QZ steps to the Hessenberg-triangular pencil 
A — AB, it is possible to reduce A to quasi-triangular form. In doing this it 
is necessary to monitor A’s subdiagonal and B’s diagonal in order to bring 
about decoupling whenever possible. The complete process, due to Moler 
and Stewart (1973), is as follows: 


Algorithm 7.7.3 Given A € R””” and B € IR^**, the following algo- 
rithm computes orthogonal Q and Z such that Q7 AZ = T is upper quasi- 


triangular and QT BZ = S is upper triangular. A is overwritten by T and 
B by S. 
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Using Algorithm 7.7.1, overwrite A with Q7.AZ (upper Hessenberg) 
and B with Q? BZ (upper triangular). 
until q= n 
Set all subdiagonal elements in A to zero that satisfy 
laii—1] € €(lag—1,i—1] + lant) 
Find the largest nonnegative q and the smallest nonnegative p 
such that if 


P n—-p—gq q 


then A33 is upper quasi-triangular and Age is unreduced 
upper Hessenberg. 
Partition B conformably: 


Bi B Bis p 
B= 0 B B23 n—p-@q 


P n—Pp-4 q 


ifg<n 
if Boz is singular 
Zero Qn—g.n—g—1 


else 
Apply Algorithm 1.1.2 to Ag and B5 
A = diag(15, Q, Iq)? Adiag(Ip, Z, Iq) 
B = diag(Ip, Q, Iq)? Bdiag(Ip, Z, Iq) 
end 
end 
end 


This algorithm requires 30n? flops. If Q is desired, an additional 16n? are 
necessary. If Z is required, an additional 20n? are needed. These estimates 
of work are based on the experience that about two QZ iterations per 
eigenvalue are necessary. Thus, the convergence properties of QZ are the 
same as for QR. The speed of the QZ algorithm is not affected by rank 
deficiency in B. 


The computed S and T' can be shown to satisfy 


Qi(AJ-E)Z, =T  QJ(B+F)Zo = S 
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where Qo and Zo are exactly orthogonal and || E ||; = ull A ||a and || F ||a = 
ull B [la. 


Example 7.7.5 If the QZ algorithm is applied to 


2.3 4 5 6 1 -1 -1 -1 -1 
445 6 7 0 1 -1 -1 -1 
A= 0 3 6 7 8 and B = 0 0 1 -1 -1 
002 8 9 0 0 0 1 -1 
0 0 0 1 10 0 0 0 0 1 


then the subdiagonal elements of A converge as follows 


“Iteration O(a) O(lha2)) Olha) — Osa - 


1 10 10! 10° 1071 
2 109 10? 109 107! 
3 109 10! 107! 1073 
4 109 10? 107! 10-8 
5 10? 10! 107! 10-18 
6 100 10° 1072 converg. 
7 10° 107! 1074 

8 10! 107! 10-8 

9 10° 107! 10719 

10 10° 107? converg 

11 107! 1071 

12 1077? 10-11 

13 1073 10-27 

14 converg.  converg 


7.7.8 | Generalized Invariant Subspace Computations 


Many of the invariant subspace computations discussed in $7.6 carry over to 
the generalized eigenvalue problem. For example, approximate eigenvectors 
can be found via inverse iteration: 


q% e C?*" given. 

for k = 1,2,... 
Solve (A — uB)z(? = Bq(*-9 
Normalize: q? = 2%) /| 209 |l. 
A9 = [g(9]H Ag) [909] H Ag 

end 


When B is nonsingular, this is equivalent to applying (7.6.1) with the 
matrix B^!A. Typically, only a single iteration is required if 4 is an ap- 
proximate eigenvalue computed by the QZ algorithm. By inverse iterat- 
ing with the Hessenberg-triangular pencil, costly accumulation of the Z- 
transformations during the QZ iteration can be avoided. 

Corresponding to the notion of an invariant subspace for a single ma- 
trix, we have the notion of a deflating subspace for the pencil A — AB. In 
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particular, we say that a k-dimensional subspace S C R” is “deflating” for 
the pencil A — AB if the subspace ( Ar + By:z,y € S ) has dimension k or 
less. Note that the columns of the matrix Z in the generalized Schur decom- 
position define a family of deflating subspaces, for if Q = [q1,...,q4 ] and 
Z = [ z1; .-., Zn ] then we have span{Az,...,Az.} G span(qi,..., qx) and 
span(Bz;,..., Bz,) C span{qı,...,@k}. Properties of deflating subspaces 
and their behavior under perturbation are described in Stewart (1972). 


Problems 


P7.7.1 Suppose A and B are in R"™*" and that 


D 0 
UT By = E a "eon U=([ 0, Uz | V-—-[WM Vz | 
rn-r T n-—r T n—r 


is the SVD of B, where D is r-by-r and r = rank(B). Show that if A(A, B) = Č then 
UT AV» is singular. 
P7.7.2 Define F : R” — R by 
2 
zt BT Az 

NE add 28 
s zTBTBz 


F(z) = 2 


where A and B are in R™*”. Show that if VF(x) = 0, then Az is a multiple of Bz. 


P7.7.3 Suppose A and B are in R"*". Give an algorithm for computing orthogonal Q 
and Z such that QT AZ is upper Hessenberg and ZT BQ is upper triangular. 


P7.7.4 Suppose 


_| ån An [Bn Big 
a=] 0 jud = B=| 0 Bz 


with A11, B11 € RF** and A22, B3; € E/*?, Under what circumstances do there exist 
_ | Ie Xm ll Y 

x-[e p] rug x 
so that Y -1AX and Y-1BX are both block diagonal? This is the generalized Sylvester 
equation problem. Specify an algorithm for the case when A31, A22, B11, and B22 are 
upper triangular. See Kagstrém (1994). 
P7.7.5 Suppose u ¢ A(A, B). Relate the eigenvalues and eigenvectors of A; = (A -- 
BB)! A and Bı = (A ~ uB)^! B to the generalized eigenvalues and eigenvectors of 
A — AB. 
P7.7.0 Suppose A, B, C, D c R"*". Show how to compute orthogonal matrices Q, Z,U, 
&nd V such that QT AU is upper Hessenberg and VT CZ, QT BV, and VT DZ are all 
upper triangular. Note that this converts the pencil AC — ABD to Hessenberg-triangular 
form. Your algorithm should not form the products AC or BD explicitly and not should 
not compute any matrix inverse. See Van Loan (1975). 


Notes and References for Sec. 7.7 


Mathematical aspects of the generalized eigenvalue problem are covered in 
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Chapter 8 


The Symmetric 
Eigenvalue Problem 


§8.1 Properties and Decompositions 

§8.2 Power Iterations 

88.3 The Symmetric QR Algorithm 

§8.4 Jacobi Methods 

§8.5 Tridiagonal Methods 

§8.6 Computing the SVD 

§8.7 Some Generalized Eigenvalue Problems 


The symmetric eigenvalue problem with its rich mathematical struc- 
ture is one of the most aesthetically pleasing problems in numerical linear 
algebra. We begin our presentation with a brief discussion of the math- 
ematical properties that underlie this computation. In §8.2 and §8.3 we 
develop various power iterations eventually focusing on the symmetric QR. 
algorithm. 

In 88.4 we discuss Jacobi's method, one of the earliest matrix algorithms 
to appear in the literature. This technique is of current interest because it is 
amenable to parallel computation and because under certain circumstances 
it has superior accuracy. 

Various methods for the tridiagonal case are presented in 88.5. These 
include the method of bisection and a divide and conquer technique. 

The computation of the singular value decomposition is detailed in 88.6. 
The central algorithm is a variant of the symmetric QR iteration that works 
on bidiagonal matrices. 
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In the final section we discuss the generalized eigenvalue problem Az = 
ABz for the important case when A is symmetric and B is symmetric 
positive definite. No suitable analog of the orthogonally-based QZ algo- 
rithm (see 87.7) exists for this specially structured, generalized eigenprob- 
lem. However, there are several successful methods that can be applied 
and these are presented along with a discussion of the generalized singular 
value decomposition. 


Before You Begin 


Chapter 1, §§2.1-2.5, and 82.7, Chapter 3, $84.1-4.3, §§5.1-5.5 and 87.1.1 
are assumed. Within this chapter there are the following dependencies: 


88.4 
T 
$881 - §82 — 883 -— §86 — $87 


| 
88.5 


Many of the algorithms and theorems in this chapter have unsymmetric 
counterparts in Chapter 7. However, except for a few concepts and defini- 
tions, our treatment of the symmetric eigenproblem can be studied before 
reading Chapter 7. 

Complementary references include Wilkinson (1965), Stewart (1973), 
Gourlay and Watson (1973), Hager (1988), Chatelin (1993), Parlett (1980), 
Stewart and Sun (1990), Watkins (1991), Jennings and McKeowen (1992), 
and Datta (1995). Some Matlab functions important to this chapter are 
schur and svd. LAPACK connections include 


LAPACK: Symmetric Eigenproblem 


-SYEV AH eigenvalues and vectors 
-SYEVD | Same but uses divide and conquer for eigenvectors 
Selected eigenvalues and vectors 
-SYTRD ; Householder tridiagonalization 
-SBTRD | Householder tridiagonalization (A banded) 
-SPTRD | Householder tridiagonalization (A in packed storage) 
All eigenvalues and vectors of tridiagonal by implicit QR 
-STEDC | All eigenvalues and vectors of tridiagonal by divide and conquer 
-STERF | All eigenvalues of tridiagonal by root-free QR 
All eigenvalues and eigenvectors of positive definite tridiagonal 


Selected eigenvalues of tridiagonal by bisection 
Selected eigenvectors of tridiagonal by inverse iteration 


LAPACK: Symmetric-Definite Eigenproblems 
-SYGST | Converts A — AB to C — Af form 

-PBSTF | Split Cholesky factorization 

-SBGST | Converts banded A — AB to C — AI form via split Cholesky 
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LAPACK: SVD 


A — UXVT 

SVD of real bidiagonal matrix 
bidiagonalization of general matrix 
generates the orthogonal transformations 
bidiagonalization of band matrix 


LAPACK: The Generalized Singular Value Problem 


-GGSVP | Converts AT A — u? BT B to triangular AT A1 — p? BT Bi 
-TGSJA | Computes GSVD of a pair of triangular matrices. 


8.1 Properties and Decompositions 


In this section we set down the mathematics that is required to develop 
and analyze algorithms for the symmetric eigenvalue problem. 


8.1.1 Eigenvalues and Eigenvectors 


Symmetry guarantees that all of A's eigenvalues are real and that there is 
an orthonormal basis of eigenvectors. 


Theorem 8.1.1 (Symmetric Schur Decomposition) If A € IR"*" is sym- 
metric, then there exists a real orthogonal Q such that 


Q* AQ = A = diag(A,,..., An). 
Moreover, for k = 1:n, AQ(:,k) = AkQ(:, k}. See Theorem 7.1.3. 


Proof. Suppose A; € A(A) and that z € C" is a unit 2-norm eigenvector 
with Az = Az. Since A1 = zHÜ Az = rH AH; = q! Ar = A, it follows 
that Àj € R. Thus, we may assume that rc R”. Let P, c IR?"" be 
a Householder matrix such that Plz = e, = I4,(;,1). It follows from 
Az = X42 that (PI APj)e, = Aej. This says that the first column of 
pr AP, is a multiple of ej. But since pr AP, is symmetric it must have 
the form 


À 0 
PÍ AP, | 0 A 


where A; € R(*~)*("-)) is symmetric. By induction we may assume that 
there is an orthogonal Q; € IR(^- DX(-9) such that QT A1Q, = A, is diag- 
onal. The theorem follows by setting 


B 1 0 Ià 0 
Q=Pi| A and a=[% E 
and comparing columns in the matrix equation AQ = QA. O 


Example 8.1.1 If 
AS[A al and a=[% ze 
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then Q is orthogonal and QT AQ = diag(10,5). 
For a symmetric matrix A we shall use the notation \;,(A) to designate the 
kth largest eigenvalue. Thus, 

An(A) € +++ € (A) < Aa (A). 


It follows from the orthogonal invariance of the 2-norm that A has singular 
values {{A1(A)|;---,[An(A)|} and so 


| Alla = max{ [u(A)l , |An(A)] J- 

The eigenvalues of à symmetric matrix have a ^minimax" characteriza- 
tion based on the values that can be assumed by the quadratic form ratio 
zT Ax/xT x. 

Theorem 8.1.2 (Courant-Fischer Minimax Theorem) Jf Ac R^*" 
is symmetric, then TA 
Ax(A) = max min 53 


dim(S)-k 0y€S yTy 


fork =1:n. 


Proof. Let QT AQ = diag(A;) be the Schur decomposition with A4 = A%(A) 
and Q = [qi, 2. ..., 4n]. Define 


Sy = span{q,---, Qk}, 
the invariant subspace associated with 1,..., Ax. It is easy to show that 


TA TA 
max y cy > min y ^y 


T 
2 = qk Ágk = Ak(A). 
dim(S)-k 0#4yES yTy OYES yTy 7 


To establish the reverse inequality, let § be any k-dimensional subspace and 
note that it must intersect span{q,,...,4n}, a subspace that has dimension 
n— k +1. If y. = ong, +---+ @ngn is in this intersection, then 


o yl Ay ya Ay. 
min ~F <= 
0xycS V V Ys Ya 


Ax(A). 
Since this inequality holds for all k-dimensional subspaces, 


TA 
Inax min y T y 
dim(S)=k 0ozyces V V 


< Ax(A) 


thereby completing the proof of the theorem. O 


If A c IR?*" is symmetric positive definite, then 4,(A) > 0. 
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8.1.2 Eigenvalue Sensitivity 


An important solution framework for the symmetric eigenproblem involves 
the production of a sequence of orthogonal transformations {Qx} with the 
property that the matrices QT AQ, are progressively “more diagonal.” The 
question naturally arises, how well do the diagonal elements of a matrix 
approximate its eigenvalues? 


Theorem 8.1.3 (Gershgorin) Suppose A € IR"*” is symmetric and that 
Q c R'** is orthogonal. If QT AQ = D +F where D = diag(di,...,dn) 
and F has zero diagonal entries, then 


A(A) € ld — ri, di t ri 
i=1 


n 
where ri = * |fij| for i = lin. See Theorem 7.2.1. 
j=l 


Proof. Suppose A € A(A) and assume without loss of generality that A Æ di 
for i = l:n. Since (D — AI) + F is singular, it follows from Lemma 2.3.3 
that 


n 
z fk; Tk 
1 < (D -AIF ls = P ue adir EN 
j=1 


for some k, 1 € k € n. But this implies that A € [dy — ry, dy + r4]. O 


Example 8.1.2 'The matrix 
2.0000 0.1000 0.2000 
A= | 0.2000 5.0000 0.3000 
0.1000 0.3000  —1.0000 
has Gerschgorin intervals [1.7,2.3], [4.5,5.5], and [—1.4, —.6] and eigenvalues 1.9984, 
5.0224, and -1.0208. 


The next results show that if A is perturbed by a symmetric matrix E, 
then its eigenvalues do not move by more than || £ ||. 


Theorem 8.1.4 (Wielandt-Hoffman) If A and A E are n-by-n sym- 
metric matrices, then 


Y 044 5) -A4A)* < IIB. 
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Proof. A proof can be found in Wilkinson (1965, pp.104-8) or Stewart 
and Sun (1991, pp.189-191). See also P8.1.5. O 


Example 8.1.3 If 


68 24 002 .003 
a= [24 d and E= | 003 ea 


then A(A) = (5,10) and A(A + E) = (4.9988, 10.004} confirming that 
1.95 x 10-5 = |4.9988 — 5|? + |10.004 2 10|? < || E || = 2.3 x 107. 


Theorem 8.1.5 J A and A + E are n-by-n symmetric matrices, then 
Ax (A) + An(E) < Ak(A T E) < dx (A) + AY(E) k — 1:n. 


Proof. This follows from the minimax characterization. See Wilkinson 
(1965, pp.101-2) or Stewart and Sun (1990, p.203). O 


Example 8.1.4 If 


68 24 002  .003 
d E, 8.2 BE VADE l 003.001 | 


then A(A) = (5, 10}, A(E) = (-.0015, .0045), and A(A + E) = (4.9988, 10.0042). 
confirming that 


5—.0015 < 4.9988 < 5 + .0045 
I0—.0015 < 10.0042 < 10+ .0045. 


Corollary 8.1.6 /f A and A + E are n-by-n symmetric matrices, then 
IA«CA + E) — àk(A)| < IE lle 

for k = 1:n. 

Proof. 


|AK(A + E) - Ax(A)] € max{|An(#)|, |Ai(2) ||} = YF fle. 0 


Several more useful perturbation results follow from the minimax property. 


Theorem 8.1.7 (Interlacing Property) Jf A € IR*"" is symmetric and 
A, = A(1:r, 1:r), then 


Ar¢i(Arg¢i) € Ar(Ar) € Ar(Apyi) € < Aaf Ara) € Al Ar) € Ar (Ar 41) 


for r 2 1:n — 1. 
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Proof. Wilkinson (1965, pp.103-4). O 


Example 8.1.5 If 


1 1 1 1 
_{12 3 4 
ET Ae 610 
1 4 10 20 


then A(A1) = {1}, A(A2) = (.3820, 2.6180}, A(A3) = {.1270, 1.0000, 7.873}, and 
ACA4) = (.0380, .4538, 2.2034, 26.3047}. 


Theorem 8.1.8 Suppose B = A + rec? where A € IR'*" is symmetric, 
c € R” has unit 2-norm andr € R. If 7 > 0, then 
Ai(B) € [4(A), AjL1CA)] i — 2m 
while if T <0 then 
A(B) € [Aigi(A),A(A)], t= ln-1. 
In either case, there exist nonnegative m,...,m4 such that 
A,(B) = A;(A) + mir, pe 
with m, 4-4 m, = 1. 


Proof. Wilkinson (1965, pp.94-97). See also P8.1.8. LI 


8.1.3 Invariant Subspaces 


Many eigenvalue computations proceed by breaking the original problem 
into a collection of smaller subproblems. The following result is the basis 
for this solution framework. 


Theorem 8.1.9 Suppose A € IR"*" is symmetric and that 


Q=[Q Qə ] 


r "n-—Tr 


is orthogonal. If ran(Q,) is an invariant subspace, then 


Dj O0 r 

TAó0-n-— 1 

Ear ens | 0 A n—r (8.1.1) 
r n—r 


and A(A) = A(D1) UXA(De). See also Lemma 7.1.2. 
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Proof. If D, ET 
T 1 
AQ — 21 


then fron AQ = QD we have AQ; — Q1D, = Qs3E»,. Since ran(Qj) is 
invariant, the columns of Q2 E»; are also in ran(Q,) and therefore perpen- 
dicular to the columns of Q2. Thus, 
0— Q3(AQ: - Q1D1) = QQ;En = En. 
and so (8.1.1) holds. It is easy to show 
det(A — AI4) = det(Q? AQ — A1,) = det(D, — AI,)det(Ds — AI, ,) 
confirming that (A) = A(Di) U A(D5). O 


The sensitivity to perturbation of an invariant subspace depends upon 
the separation of the associated eigenvalues from the rest of the spectrum. 
The appropriate measure of separation between the eigenvalues of two sym- 
metric matrices B and C is given by 


sep(B,C) — eges |^ — pl. (8.1.2) 
pwEA(C) 


With this definition we have 


Theorem 8.1.10 Suppose A and A+ E are n-by-n symmetric matrices 
and that 
Q-[Qi Q] 


r n-—r 


is an orthogonal matriz such that ran(Q,) is an invariant subspace for A. 
Partition the matrices QT AQ and QT EQ as follows: 


ee-|ti].. em-[E E] 


—Tr E51 E22 n-—-r 
T n-r T n-T 


If sep(D,, D2) > 0 and 

sep( Dı, D2) 

BE a 

then there exists a matriz P € RO")*" with 

DCBS l| En |l 

such that the columns of Q1 = (Qi + Q2P)(I + PT P)-!/? define an or- 


thonormal basis for a subspace that is invariant for A--E. See also Theorem 
7.2.4. 


| Ell; < 


| Pla < 
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Proof. This result is a slight adaptation of of Theorem 4.11 in Stewart 
(1973). The matrix (J + P’P)~1/? is the inverse of the square root of 
I+ PT P. See §4.2.10. Ul 


Corollary 8.1.11 Jf the conditions of the theorem hold, then 


dist(ran(Q1), ran(Qi)) < SOD | Ez lla. 
See also Corollary 7.2.5. 
Proof. It can be shown using the SVD that 
| P+ PPP)? |l; x | P fo. (8.1.3) 


Since QT Qi = P(I + PB P)—1/2 it follows that 
dist(ran(Q1), ran(Qi)) 


| QTQi lla = || PU + PEPY? |l; 


lA 


| Pla < || Ez [[2/sep(Di, D2). G 


Thus, the reciprocal of sep( Di, D2) can be thought of as a condition number 
that measures the sensitivity of ran(Qi) as an invariant subspace. 

The effect of perturbations on a single eigenvector is sufficiently impor- 
tant that we specialize the above results to this important case. 


Theorem 8.1.12 Suppose A and A+ E are n-by-n symmetric matrices 
and that 
Q=[a Qe | 
1 n-1 


is an orthogonal matriz such that qı is an eigenvector for A. Partition the 
matrices QT AQ and QT EQ as follows: 


A 0 1 e eT 1 

T — T: em 

Q AQ = H Da | n—1 Q EQ = |: Eu 
1-1 1n-1 


ffd= min |À- u| > 0 and 
pEXA(D2) 


| E lle 


lA 
AIA 


-+ 


then there erists p € R^ ! satisfying 


4 
lP lle € zle lz 
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such that à = (qı +Q2p)/ V 1 + pTp is a unit 2-norm eigenvector for A+ E. 
Moreover, 


dist(span{qi},span{qi}) = 1—(@7 4)? < 5 lle ll: 


See also Corollary 7.2.6. 


Proof. Apply Theorem 8.1.10 and Corollary 8.1.11 with r — 1 and observe 
that if Dı = (A), then d = sep(Di, D2). O 


Example 8.1.6 If A  diag(.999, 1.001, 2.), and 


0.00 0.01 0.01 
E = | 0.01 000 0.01 |, 
0.01 0.01 0.00 


then QT (A + E)Q = diag(.9899, 1.0098, 2.0002) where 


Ĝĝ = | 6708  .747 .0101 


. — 7418 — .6706 0101 
0007  —.0143  .9999 


is orthogonal. Let Q; — Qe;i- 1,2,3. Thus, Qi is the perturbation of A's eigenvector 
qi = ei. A calculation shows that 


dist(span(q),span(ii)) = dist{span{q2}, span{42}} = .67 


Thus, because they are associated with nearby eigenvalues, the eigenvectors qı and q» 
cannot be computed accurately. On the other hand, since 4; and A2 are well separated 
from A3, they define a two-dimensional subspace that is not particularly sensitive as 
dist {span{q1, q2), span{@1, G2}} = .01. 


8.1.4 Approximate Invariant Subspaces 


If the columns of Q; € IR"*" are independent and the residual matriz R = 
AQ, — Qı S is small for some S € IR", then the columns of Q, define an 
approximate invariant subspace. Let us discover what we can say about 
the eigensystem of A when in the possession of such a matrix. 


Theorem 8.1.13 Suppose A c IR"*" and S € IR"" are symmetric and 
that 


AQi -Q:5 = E, 


where Q, € R" satisfies QU Q, = I. Then there exist ui,..., Hr € A(A) 
such that 


lux — A&(S)) € v2 || E lla 
for k — l:r. 
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Proof. Let Q; € R"*-"” be any matrix such that Q = [ Qi, Q2 ] is 
orthogonal. It follows that 


S 0 QTE, EFQ 
+ 
0 QFAQ2 QUE, 0 


and so by using Corollary 8.1.6 we have |A,;(A) — A&(B)) x || Ella for 
k = l:n. Since A(S) C ACB), there exist 1,..., Hr € A(A) such that 


[tz — A(S) € | Elle 


Qt AQ = | | = B+E 


for k = Lr. The theorem follows by noting that for any x € IR^ and 
y € IR" " we have 


JH 


| s tic e TBP Qavlls < LE Nele 2+1 Ei ee 
from which we readily conclude that || E ||; € V2|| E1 |l2. O 


Example 8.1.7 If 


6.8 2.4 .7994 
ASÍ var Q =|; ] aas = GNER 


then 


AQi -QiS = | geese | = E. 


The theorem predicts that A has an eigenvalue within 4/2 || E1 |a ^: .1415 of 5.1. This 
is true since A(.A) = (5, 10}. 


The eigenvalue bounds in Theorem 8.1.13 depend on || AQ; — Q15 ||. 
Given A and Qj, the following theorem indicates how to choose S so that 
this quantity is minimized in the Frobenius norm. 


Theorem 8.1.14 Jf A € IR^*" is symmetric and Qı € IR^"" has orthonor- 
mal columns, then 


min | AQi-QiS|p = ||- Q1QT)AQ1 |lp 
ScR'"*" 


and S — QT AQ is the minimizer. 


Proof. Let Q4 € IR * (^77 be such that Q = [Qi, Q2] is orthogonal. For 
any S € IR'*" we have 


|| 4Q1 - Q15 ||? | QTAQ1 — QTQ1 5 |$ 


IQTAQi - Slip + 1 QT AQs IŻ. 


li 
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Clearly, the minimizing S is given by S = QI AQ). O 


This result enables us to associate any r-dimensional subspace ran(Qi), 
with a set of r “optimal” eigenvalue-eigenvector approximates. 


Theorem 8.1.15 Suppose A c R" is symmetric and that Qı € R"*" 
satisfies QI Q, = Ip. If 


Z™(QTAQ1)Z = diag(6,,...,0,) = D 
is the Schur decomposition of QT AQ, and QiZ = [yi ..., yr], then 
|| Ayk — Oxy la = I| U- Q1QP)AQi Zer |l2 S || (I — Q1QP) AQ) Il 
fork =1:r. 
Proof. 
Ayk — un, = AQiZex = QiZDex = (AQi - QY(Q1 AQi)) Zex- 


The theorem follows by taking norms. O 


In Theorem 8.1.15, the 0, are called Ritz values, the yy are called Ritz 
vectors, and the (0,, yk) are called Ritz pairs. 

The usefulness of Theorem 8.1.13 is enhanced if we weaken the assump- 
tion that the columns of Q4 are orthonormal. As can be expected, the 
bounds deteriorate with the loss of orthogonality. 


Theorem 8.1.16 Suppose A € IR"** is symmetric and that 
AX, -XS = B, 
where X, € R”™" and S = XT AX. If 
[XFX - Le =7 <1, (8.1.4) 
then there exist j11,..-, Hr € A(A) such that 
lux — Ak(S)| € V2(I Fi lle + 7(2 +7) A lla) 
for k =1:r. 


Proof. Let X, = ZP be the polar decomposition of Xj. Recall from 
£4.2.10 that this means Z € IR^*" has orthonormal columns and P c IR*** 
is a symmetric positive semidefinite matrix that satisfies P? = XT Xj. 
Taking norms in the equation 


E,z AZ-ZS = (AX|- XS) + A(Z - X1) - (Z- Xy)S 
= HB + AZ(I- P) - Z(I - PXT AX, 
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gives 
IE < File + WAleIf—Plle Ql Xil3)- (8.1.5) 
Equation (8.1.4) implies that 
| Xi l <1 +r. (8.1.6) 
Since P is positive semidefinite, (J + P) is nonsingular and so 
I-P =(I+ PY- P*) = (14+ P) (1 - XIX) 


which implies ||  — Pl; € 7. By substituting this inequality and (8.1.6) 
into (8.1.5) we have || Æi |l2 € || Fillo + 7(2 + 7)| A|la. The proof is 


completed by noting that we can use Theorem 8.1.13 with Q4 = Z to 
relate the eigenvalues of A and S via the residual Ei. O 


8.1.5 The Law of Inertia 


The inertia of a symmetric matrix A is a triplet of nonnegative integers 
(m, z, p) where m, z, and p are respectively the number of negative, zero, 
and positive elements of (A). 


Theorem 8.1.17 (Sylvester Law of Inertia) Jf A c R"*” is symmet- 
ric and X € IR**" is nonsingular, then A and XT AX have the same iner- 
tia. 
Proof. Suppose for some r that à, (A) > 0 and define the subspace Sọ C 
IR” by 

So = span{X~'q,...,X~*a-}, gi #0 


where Ag; = A;(A)q; and i = 1:r. From the minimax characterization of 
A«(XT AX) we have 


T(XT T TAX 
A-(XT AX) = max min VAM 2 eee eA 
dim(S)=r yes yy y€ So yy 
Since 
T(yT 
X* X 
y € R” => O > a,(X)? 
T(XT AX 
y € 99 => OA > àp (A) 


it follows that 


à- (XTAX) > min 


p TAX)y yT(XTX)y 
y€So 


VATI cam Y 2 Mes (0. 
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An analogous argument with the roles of A and X7 AX reversed shows that 


A.(XT AX) 


M(A) > M(XTAX)e (C = En 


Thus, à, (A) and A-(X7 AX) have the same sign and so we have shown that 
A and X7 AX have the same number of positive eigenvalues. If we apply 
this result to — A, we conclude that A and XT AX have the same number of 
negative eigenvalues. Obviously, the number of zero eigenvalues possessed 
by each matrix is also the same. O 


Example 8.1.8 lf A = diag(3, 2, —1) and 


14 5 
X2|01 2], 
0 0 1 
then 
3 12 15 
XTAX = | 12 50 64 
15 64 82 


and A(XT AX) = (134.769, .3555, —.1252}. 


Problems 


P8.1.1 Without using any of the results in this section, show that the eigenvalues of a 
2-by-2 symmetric matrix must be real. 


2 3 


P8.1.3 Show that the eigenvalues of a Hermitian matrix (AP = A) are real. For 
each theorem and corollary in this section, state and prove the corresponding result for 
Hermitian matrices. Which results have analogs when A is skew-symmetric? (Hint: If 
A? = —A, then iA is Hermitian.) 


P8.1.4 Show that if X € R"*", r «n, and | XTX —1|| 27 < 1, then dmin(X) > 1-7. 


P8.1.2 Compute the Schur decomposition of A = | E. | 


P8.1.5 Suppose A, E € R"*” are symmetric and consider the Schur decomposition 
A+tE = QDQT where we assume that Q = Q(t) and D = D(t) are continuously differ- 
entiable functions of t € R. Show that D(t) = diag(Q(t)T EQ(t)) where the matrix on 
the right is the diagonal part of Q(t)? EQ(t). Establish the Wielandt-Hoffman theorem 
by integrating both sides of this equation from 0 to 1 and taking Frobenius norms to 
show that 


1 
| D(1) - DO) ly < | | diag(Q(t)T EQ(t) [pdt < IE lp. 
0 


P8.1.6 Prove Theorem 8.1.5. 
PB.1.7 Prove Theorem 8.1.7. 


P8.1.8 If C € R”*” then the trace function tr(C) = c11 +--+ + enn equals the sum of 
C's eigenvalues. Use this to prove Theorem 8.1.8. 


P8.1.0 Show that if B € R”X™ and C € R”*” are symmetric, then sep(B, C) = min 
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l| BX — XC || where the min is taken over all matrices in R” *”. 
P8.1.10 Prove the inequality (8.1.3). 


P8.1.11 Suppose A € R”*” is symmetric and C € R”*" has full column rank and 
assume that r «& n. By using Theorem 8.1.8 relate the eigenvalues of A + CCT to the 
eigenvalues of A. 


Notes and References for Sec. 8.1 


The perturbation theory for the symmetric eigenvalue problem is surveyed in Wilkinson 
(1965, chapter 2), Parlett (1980, chapters 10 and 11), and Stewart and Sun (1990, chap- 
ters 4 and 5). Some representative papers in this well-researched area include 


G.W. Stewart (1973). *Error and Perturbetion Bounds for Subspaces Assaciated with 
Certain Eigenvalue Problems,” SIAM Review 15, 727-64. 

C.C. Paige (1974). “Eigenvalues of Perturbed Hermitian Matrices,” Lin. Alg. and Its 
Applic . 8, 1-10. 

A. Ruhe (1975). “On the Closeness of Eigenvalues and Singular Values for Almost 
Normal Matrices,” Lin. Alg. and Its Applic. 11, 87—94. 

W. Kahan (1975). “Spectra of Nearly Hermitian Matrices," Proc. Amer. Math. Soc. 
48, 11-17. 

A. Schonhage (1979). “Arbitrary Perturbations of Hermitian Matrices,” Lin. Alg. and 
Ita Applic. 24, 143-49. 

P. Deift, T. Nanda, and C. Tomei (1983). “Ordinary Differential Equations and the 
Symmetric Eigenvalue Problem,” SIAM J. Numer. Anal. 20, 1-22. 

D.S. Scott (1985). “On the Accuracy of the Gershgorin Circle Theorem for Bounding 
the Spread of a Real Symmetric Matrix,” Lin. Alg. and Its Applic. 65, 147-155 
J.-G. Sun (1995). “A Note on Backward Error Perturbations for the Hermitian Eigen- 

value Problem,” BIT 35, 385-393. 

R.-C. Li (1996). “Relative Perturbation Theory (I) Eigenvalue and Singular Value Vari- 
ations,” Technical Report UCB//CSD-94-855, Department of EECS, University of 
California at Berkeley. 

R.-C. Li (1996). “Relative Perturbation Theory (II) Eigenspace and Singular Subspace 
Variations,” Technical Report UCB//CSD-94-856, Department of EECS, University 
of California at Berkeley. 


8.2 Power Iterations 


Assume that A € IR"*” is symmetric and that Ug € IR?*" is orthogonal. 
Consider the following QR iteration: 


To = Ud AU 

for k = 1,2,... 
Tk- = U,R, (QR factorization) (8.2.1) 
Tk = RU; 

end 


Since Ty = RU, = UZ(U,R4)U& = ULT, iU, it follows by induction 
that 
Tk = (UgU, --- Ux) A(UoU1 -- - Uk). (8.2.2) 
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Thus, each Tk is orthogonally similar to A. Moreover, the Tą almost al- 
ways converge to diagonal form and so it can be said that (8.2.1) almost 
always “converges” to a Schur decomposition of A. In order to establish 
this remarkable result we first consider the power method and the method 
of orthogonal iteration. 


8.2.1 The Power Method 


Given a unit 2-norm q(9 c IR”, the power method produces a sequence of 
vectors g(&) as follows: 


for k —1,2,... 
Z9 = Aq(*-0 
«0 = 40/209 |; (8.2.3) 


ie) = [g|T Ag) 
end 


If g is not “deficient” and A's eigenvalue of maximum modulus is unique, 
then the q*) converge to an eigenvector. 


Theorem 8.2.1 Suppose A € IR"*" is symmetric and that 
QT AQ = diag(A1,..-,An) 


where Q = [q,...,qn] is orthogonal and |M| > |A2| > = > |An|. Let the 
vectors qy be specified by (8.2.3) and define 0, € [0, 7/2} by 


cos(0,) = lara) . 


If cos(89) Æ 0, then 


k 


A 


|sin(6.)] tan(65) 


À2 
Àj 


(8.2.4) 


2k 
À2 


AUD — A 
| | 3: 


lA 


[Ai — An| tan(09)? (8.2.5) 


Proof. From the definition of the iteration, it follows that g) is a multiple 
of A*g®) and so 


2 T Akq(0) x? 
i qaq 
sin(a)? = 1 — (qFq®) = 1 - G =) | 
(a ) || A59? |; 


If g®) has the eigenvector expansion g® = aq, + --- + a4,q,, then 


le1| = laTq(9| = cos(@) # 0, 


8.2. POWER ITERATIONS 407 


and 
Akg() a: aiA qi T 02A 52 Tec Gn Ae dn 
Thus, 
n 
232k 
2 ai dik 25 2 
E 1— 
lsin(@,)|" = 1 = = = 
S aN" 3 uM. 
n 
242 
$4 AP 1 n x 2k 
aye AG, 


^ 
Aj + 
LE 
i M- 
bo 
I] 
LZ e] 
es 
ors. 
Rg 
Bim 
aec 
[vw] 
P 
| 
— 
L I 
e 
_ 
Po 
>| > 
= |e 
er 
bo 
r 


lt 
et 
S 
mS 
D 
t" 
bho 
neo 
xà 
mr 
ss 


This proves (8.2.4). Likewise, 


n 


2, 2k+1 
[g(]” 42190 2 4X 


T 
A) = g| AqQ) ual NR 
T n 
| | [a] A2k g(0) 502d} 
i=l 
and so 
> AF Qs - An) 12 Ve 
AY LX = |= < [Ar —Anl 4 at (5) 
| | RN | l Fe 2. + xi 


n 
Ye 
t=1 


2k 
|i — An| tan(89)? (3) .n 
1 


LA 


Example 8.2.1 The eigenvalues of 


— 1.6407 1.0814 1.2014 1.1539 
1.0814 4.1573 7.4035  —1.0463 
1.2014 7.4035 2.7890 | —1.5737 
1.1539  —1.0463  —1.5737 8.6944 


A= 
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are given by A(A) = {12,8,—4,—2}. If (8.2.3) is applied to this matrix with gO = 
[1 0 0 0]7, then 


k 

1 

2 

3 

4 

5 | 11.5259 

6 | 11.7747 

7 | 11.8967 

8 | 11.9534 

9 | 11.9792 
10 | 11.9907 


Observe the convergence to A; = 12 with rate |\2/A1|7* = (8/12)2* = (4/9)*. 


Computable error bounds for the power method can be obtained by using 
Theorem 8.1.13. If 
| Ag — AM q® |, = 6, 


then there exists À € A( A) such that |? — A| < 726. 


8.2.2 Inverse Iteration 


Suppose the power method is applied with A replaced by (A — A7) !. If A 
is very close to a distinct eigenvalue of A, then the next iterate vector will 
be very rich in the corresponding eigendirection: 


n 
T = , Qidi 
z=] 


Agi = Aigi, 1 = l:n 


n a; 
> (A-AI)!z = $ ai. 
EINSA 


Thus, if A ~ A; and a; is not too small, then this vector has a strong 
component in the direction of q;. This process is called inverse iteration 
and it requires the solution of a linear system with matrix of coefficients 
A — Ar. 


8.2.9 Rayleigh Quotient Iteration 


Suppose A € IR"*" is symmetric and that x is a given nonzero n-vector. A 
simple differentiation reveals that 
T 
T” ÅT 
A= r(x) = p* 
minimizes || (A — AZ)z ||2. (See also Theorem 8.1.14.) The scalar r(x) is 
called the Rayleigh quotient of x. Clearly, if z is an approximate eigen- 
vector, then r(z) is a reasonable choice for the corresponding eigenvalue. 
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Combining this idea with inverse iteration gives rise to the Rayleigh quotient 
iteration: 


zo given, || zo ||; = 1 

for k 20,1,... 
Hk = r(z&) (8.2.6) 
Solve (A — ukI)zg41 = £k for Zk41 


Tkl = Zk41/ ll Zia llo 
end 


Example 8.2.2 If (8.2.6) is applied to 


1 1 1 1 1 1 
1 2 3 4 5 6 
A= 1 3 6 10 15 21 
1 4 10 20 35 56 
1 5 15 35 70 126 
1 6 21 56 126 252 


with zo = [1, 1, 1, 1, 1, 1]7/6, then 


The iteration is converging to the eigenvalue A = 15.5534732737. 


The Rayleigh quotient iteration almost always converges and when it 
does, the rate of convergence is cubic. We demonstrate this for the case 
n = 2. Without loss of generality, we may assume that A = diag(A1, Ao), 
with A, > As. Denoting xz, by 


it follows that uy = A1c2 + 4152 in (8.2.6) and 


1 Ckf Sk 
z = ——_ . 
HU MA | -sk /Ch 
A calculation shows that 
Ch -s4 
ka = ———— S41 = ——. 8.2.7 
VCk + Sk V Ck t Sk Li 

From these equations it is clear that the ry converge cubically to either 
span{e,} or span{e2} provided |c| Æ |sx]. 

Details associated with the practical implementation of the Rayleigh 
quotient iteration may be found in Parlett (1974). 
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8.2.4 Orthogonal Iteration 


A straightforward generalization of the power method can be used to com- 
pute higher-dimensional invariant subspaces. Let r be a chosen integer 
satisfying 1 < r < n. Given an n-by-r matrix Qo with orthonormal 
columns, the method of orthogonal iteration generates a sequence of matri- 
ces {Qk} C IR?** as follows: 


for k= 1,2,... 
Zk = AQk-1 (8.2.8) 
Qk Rk = Zk (QR factorization) 

end 


Note that if r = 1, then this is just the power method. Moreover, the 
sequence (Q,e, } is precisely the sequence of vectors produced by the power 
iteration with starting vector q% = Qoei. 

In order to analyze the behavior of (8.2.8), assume that 


QTAQ = D = diag(M) Mal > Dal 2 °° > JAn] (8.2.9) 


is a Schur decomposition of A € IR"*”. Partition Q and D as follows: 


D, 0 T 
Q-[Qe Qs] D = | » D | n—r (8.2.10) 
T n-—Tr 


r n-r 


If |Ar| > [Ar+il, then 
D,(A) = ran(Qa) 


is the dominant invariant subspace of dimension r. It is the unique invari- 
ant subspace associated with the eigenvalues \,,..., Ar- 

The following theorem shows that with reasonable assumptions, the 
subspaces ran(Q;) generated by (8.2.8) converge to D,(A) at a rate pro- 
portional to [A41 / A, |*. 


Theorem 8.2.2 Let the Schur decomposition of A € IR"*" be given by 
(8.2.9) and (8.2.10) with n > 2. Assume that |A| > |A-41| and that the 
n-by-r matrices {Qx} are defined by (8.2.8). If 8 € [0,7/2] is specified by 


MOTA BR PT 
wep ta) li Mall v lle 
vcran(Q»o) 
then 
dist( D, (A), ran(Q)) < tan(6) ^n 


See also Theorem 7.3.1. 
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Proof. By induction it can be shown that 


A*Qo = Qk (Ry +++ R1) 
and so with the partitionings (8.2.10) we have 


Df 0 Q7 Qo gres | - 
[S oleo. | = (diix eco 


T QTQ | _ 
Q'Q. = Ra, Qs] Qk = dro. | - E |. 


COS(Ümin) = or(Vo) = 1-|] Wo IE 
dist( D. (A), ran(Q,)) || We lle 
DE Vo Vi (Re: +> F3) 
DiWe = Wy (Re-+- Ri) 


If 


then 


It follows that Vo is nonsingular which in turn implies that V, and (Ax -:: 


are also nonsingular. Thus, 


Wk DEW) (Ry --- Ry! = DiW (V DEV) 


DEW Vg DI" Vk 


and so 


| We lle x || DE lla Il Wo llo I| ~ , I D; * lla || Ve flo 


k 
Àrj1 


< ral” sin(#) — — À 


= tan(@) 


=) x Dri 
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Example 8.2.3 If (8.2.8) is applied to the matrix of Example 8.2.1 with r = 2 and 


Qo = I4(:, 1:2), then 


how 


dist (D2(A),ran(Qx.)) 
0.8806 
0.4091 
0.1121 
0.0313 
0.0106 
0.0044 
0.0020 
0.0010 
0.0005 
0.0002 


Q «o 9o -1 0) S C to = 


[m 
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8.2.5 The QR Iteration 


Consider what happens when we apply the method of orthogonal iteration 
(8.2.8) with r = n. Let Q7 AQ = diag(A,,..., An) be the Schur decompo- 
sition and assume 


IA > JAg] > +++ > [X4]. 


If Q = [q,...,d4,] and Qk = Dum ud and 


dist(D;(A), span(g(, . . . , « 1 (8.2.11) 


) 


for 1 = 1:n — 1, then it follows from Theorem 8.2.2 that 


Ài+1 
Ài 


dist(span (j^... .., aJ, span{ai,...,4:}) = o( 


for 1 = 1:n — 1. This implies that the matrices T4 defined by 
T, = QE AQk 


are converging to diagonal form. Thus, it can be said that the method 
of orthogonal iteration computes a Schur decomposition if r = n and the 
original iterate Qo € IR"”” is not deficient in the sense of (8.2.11). 

The QR iteration arises by considering how to compute the matrix T; 
directly from its predecessor T,_,;. On the one hand, we have from (8.2.1) 
and the definition of Tk. that 


T,-1 = Qi 1AQs-1 = Qh_1(AQu-1) = (Q§_1 Qk) B. 


On the other hand, 


T, = QE AQk = (QE AQk-1)(QE_1 Qk) = R(Q Qk). 


Thus, Tk is determined by computing the QR factorization of T,-1; and 
then multiplying the factors together in reverse order. This is precisely 
what is done in (8.2.1). 


Example 8.2.4 If the QR iteration (8.2.1) is applied to the matrix in Example 8.2.1, 
then after 10 iterations 


11.9907  —0.1926  —0.0004 0.0000 
—0.1926 8.0093  —0.0029 0.0001 
—0.0004  —0.0029  —4.0000 0.0007 

0.0000 0.0001 0.0007 — —2.0000 


Tio = 


The off-diagonal entries of the T, matrices go to zero as follows: 
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|Tk (2, 1)| ITX (3, 2)| [Tk (4, 2)| [Tk (4, 3)| 


1 3.9254 1.8122 3.3892 4.2492 2.8367 1.1679 
2 2.6491 1.2841 2.1908 1.1587 3.1473 0.2294 
3 2.0147 0.6154 0.5082 0.0997 0.9859 0.0748 
4 1.6930 0.2408 0.0970 0.0723 0.2596 0.0440 
5 1.2928 0.0866 0.0173 0.0665 0.0667 0.0233 
6 0.9222 0.0299 0.0030 0.0405 0.0169 0.0118 
7 0.6346 0.0101 0.0005 0.0219 0.0043 0.0059 
8 0.4292 0.0034 0.0001 0.0113 0.0011 0.0030 
9 0.2880 0.0011 0.0000 0.0057 0.0003 0.0015 
10 0.1926 0.0004 0.0000 0.0029 0.0001 0.0007 


Note that a single QR iteration involves O(n?) flops. Moreover, since con- 
vergence is only linear (when it exists), it is clear that the method is a pro- 
hibitively expensive way to compute Schur decompositions. Fortunately, 
these practical difficulties can be overcome as we show in the next section. 


Problems 


P8.2.1 Suppose Ap € R"™* is symmetric and positive definite and consider the following 
iteration: 


for k = 1,2,... 
Ak-i = GGT (Cholesky) 
Ay = GIG, 

end 


(a) Show that this iteration is defined. (b) Show that if Ap = | H 
eigenvalues À1 > A2 > 0, then the A, converge to diag(A1, A2). 
P8.2.2 Prove (8.2.7). 

P8,2.3 Suppose A € R”*” is symmetric and define the function f:R"+! — R"t! by 


(3) Leo | 


where z € R” and A€ R. Suppose r} and A, are produced by applying Newton’s 
method to f at the "current point" defined by rc and Ac. Give expressions for z4} and 
A+ assuming that || zc |la = 1 and Ae = zT Are. 


L| with a > c has 


Notes and References for Sec. 8.2 


The following references are concerned with the method of orthogonal iteration (a.k.a 
the method of simultaneous iteration): 


G.W. Stewart (1969). *Accelerating The Orthogonal Iteration for the Eigenvalues of a 
Hermitian Matrix," Numer. Math. 13, 362-76. 

M. Clint and A. Jennings (1970). “The Evaluation of Eigenvalues and Eigenvectors of 
Real Symmetric Matrices by Simultaneous Iteration,” Comp. J. 13, 76-80. 

H. Rutishauser (1970). “Simultaneous Iteration Method for Symmetric Matrices,” Nu- 
mer. Math. 16, 205-23. See also Wilkinson and Reinsch (1971,pp.284-302). 
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References for the Rayleigh quotient method include 


J. Vandergraft (1971). “Generalized Rayleigh Methods with Applications to Finding 
Eigenvalues of Large Matrices,” Lin. Alg. and Its Applic. 4, 353-68. 


B.N. Parlett (1974). “The Rayleigh Quotient Iteration and Some Generalizations for 
Nonnormal Matrices,” Math. Comp. 28, 679-93. 


R.A. Tapia and D.L. Whitley (1988). "The Projected Newton Method Has Order 1+ V2 
for the Symmetric Eigenvalue Problem,” SIAM J. Num. Anal. 25, 1376-1382. 


S. Batterson and J. Smillie (1989). “The Dynamics of Rayleigh Quotient Iteration,” 
SIAM J. Num. Anal. 26, 624-636. 


C. Beattie and D.W. Fox (1989). “Localization Criteria and Containment for Rayleigh 
Quotient Iteration,” SIAM J. Matriz Anal. Appi. 10, 80-93. 


P.T.P. Tang (1994). “Dynamic Condition Estimation and Rayleigh-Ritz Approxima- 
tion,” SIAM J. Matriz Anal. Appl. 15, 331-346. 


8.3 The Symmetric QR Algorithm 


The symmetric QR iteration (8.2.1) can be made very efficient in two ways. 
First, we show how to compute an orthogonal Up such that UZ AU =T is 
tridiagonal. With this reduction, the iterates produced by (8.2.1) are all 
tridiagonal and this reduces the work per step to O(n”). Second, the idea of 
shifts are introduced and with this change the convergence to diagonal form 
proceeds at a cubic rate. This is far better than having the off-diagonal 
entries going to to zero like |À;,1/A,|* as discussed in §8.2.5. 


8.3.1 Reduction to Tridiagonal Form 


If A is symmetric, then it is possible to find an orthogonal Q such that 
QTAQ =T (8.3.1) 


is tridiagonal. We call this the tridiagonal decomposition and as a compres- 
sion of data, it represents a very big step towards diagonalization. 

We show how to compute (8.3.1) with Householder matrices. Suppose 
that Householder matrices P,,..., Pj. ; have been determined such that if 
Axg-1 = (PA Py A)* A(P XE Py 4), then 


is tridiagonal through its first k — 1 columns. If P, is an order n — k 
Householder matrix such that P, B32 is a multiple of J,_,(:,1) and if P, = 
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diag(/., Px), then the leading k-by-k principal submatrix of 


Bıı Bi» 0 k—1 
Ay = PkAk-ıPk = Bà B BasFk 1 
0 P, B32 P; B33 P n — k 
k—1 1 n—k 


is tridiagonal. Clearly, if Ug = P; --- P4. 9, then Ud AUp = T is tridiagonal. 
In the calculation of A; it is important to exploit symmetry during the 
formation of the matrix P, Ba33 P4. To be specific, suppose that PX has the 
form 
P, = I — pw? B =2/vTv, 0#veR™E. 


Note that if p = BB33v and w = p — (8p! v/2)v, then 
P, B33 P = B33 = vw? = wv. 


Since only the upper triangular portion of this matrix needs to be calcu- 
lated, we see that the transition from A,_, to A, can be accomplished in 
only 4(n — k}? flops. 


Algorithm 8.3.1 (Householder Tridiagonalization) Given a sym- 
metric A € IR**", the following algorithm overwrites A with T' = QT AQ, 
where T is tridiagonal and Q = H; +- H4. 2 is the product of Householder 
transformations. 


for k = l:n — 2 
[v, 8] = house(A(k + 1:n, k)) 
p= BA(k 4 1:n, k + 1:n)v 
w = p— (8p! v/2)v 
A(k +1,k) = |] A(k + 1:n, k) |o; A(k, k + 1) = A(k + 1.k) 
A(k + ling k + lin) = A(k + lin, k + Lin) ~ vw! — wv? 
end 


This algorithm requires 4n?/3 flops when symmetry is exploited in calcu- 
lating the rank-2 update. The matrix Q can be stored in factored form in 
the subdiagonal portion of A. If Q is explicitly required, then it can be 
formed with an additional 4n? /3 flops. 


Example 8.3.1 
10 o17f1 3 4 1 0 0 1 5 0 
0 6 8 3 2 8 o 6 8| =]|5 1032 1.76 |. 
0 8 -6 4 8 3 0 8 —6 0 1.76 -5.32 


Note that if T has a zero subdiagonal, then the eigenproblem splits into 
a pair of smaller eigenproblems. In particular, if ¢,41,, = 0, then A(T’) = 
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A(T (1:k, 1:k))UA(T(k + lin, k + 1:n)). If T has no zero subdiagonal entries, 
then it is said to be unreduced. 

Let T denote the computed version of T obtained by Algorithm 8.3.1. 
It can be shown that T= Q7(A + E)Q where Q is exactly orthogonal and 
E is a symmetric matrix satisfying | E||; < cul| A ||p where c is a small 
constant. See Wilkinson (1965, p. 297). 


8.3.2 Properties of the Tridiagonal Decomposition 


We prove two theorems about the tridiagonal decomposition both of which 
have key roles to play in the sequel. The first connects (8.3.1) to the QR 
factorization of a certain Krylov matriz. These matrices have the form 


K(A, v, k) = [v, Av,---, A* v] AER”, ve R". 


Theorem 8.3.1 /fQ! AQ = T is the tridiagonal decomposition of the sym- 
metric matriz A € IR"*", then Q? K(A, Q(:,1),n) = R is upper triangular. 
If R is nonsingular, then T is unreduced. If R is singular and k is the 
smallest indez so Tx, = 0, then k is also the smallest index so ty k-1 18 
zero. See also Theorem 7.4.3. 


Proof. It is clear that if q) = Q(:, 1), then 


QTK(A,Q(:,1),2) = [Q7%q, (Q' AQ((Q" a)..... (Q^ AQ)" (Q^ 41) | 
[ei Tei... T^ 1e] es Wt 


is upper triangular with the property that r1, = 1 and rj; = t2ita2 tiii 
for i = 2:n. Clearly, if R is nonsingular, then T is unreduced. If R is 
singular and r4 is its first zero diagonal entry, then k > 2 and tk ,_, is the 
first zero subdiagonal entry. O 


The next result shows that Q is essentially unique once Q(:, 1) is specified. 


Theorem 8.3.2 ( Implicit Q Theorem) Suppose Q = [qi,...,q« ] and 
V = [t,..., v4] are orthogonal matrices with the property that both QT AQ 
= T and VT AV = S are tridiagonal where A € R"™" is symmetric. Let k 
denote the smallest positive integer for which t& 41, = 0, with the conven- 
tion that k = n if T is unreduced. If vy = qu, then v; = tq; and |ti; 1| = 
IS; i-1| for i = 2:k. Moreover, if k <n, then Sk+1,k — 0. See also Theorem 
7.4.2. 


Proof. Define the orthogonal matrix W = QTV and observe that W(:,1) = 
In(:,1) = ei and WTTW = S. By Theorem 8.3.1, WT K(T, ei, k) is upper 
triangular with full column rank. But K(T,e,,k) is upper triangular and 
so by the essential uniqueness of the thin QR factorization, 


W(:,1:k) = In(:, 1:4)diag(+1,...,+1). 
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This says that Q(:,i) = +V(:,2) for i = 1:k. The comments about the 
subdiagonal entries follows from this since t,,1,; = Q(:,4 + 1)! AQ(:, i) and 
Si+l, i = V(: i + 1)TAV(:,i) for i = in-—-1.0 


8.3.3 The QR. Iteration and Tridiagonal Matrices 


We quickly state four facts that pertain to the QR iteration and tridiagonal 
matrices. Complete verifications are straight forward. 


1. Preservation of Form. If T = QR is the QR factorization of a sym- 
metric tridiagonal matrix T c R"*", then Q has lower bandwidth 1 
and R has upper bandwidth 2 and it follows that 


T+ = RQ = Q7(QR)Q = QTQ 
is also symmetric and tridiagonal. 
2. Shifts. If s € R and T — sI = QR is the QR factorization, then 
T, = RQ +sI = QTQ 
is also tridiagonal. This is called a skifted QR step. 


3. Perfect Shifts. If T is unreduced, then the first n — 1 columns of T—s1 . 
are independent regardless of s. Thus, if s € A(T) and 


QR-T -sl 


is a QR factorization, then rn, = 0 and the last column of T, = 
RQ + sI equals s1,(:, n) = sey. 


4. Cost. If T € IR**" is tridiagonal, then its QR factorization can be 
computed by applying a sequence of n — 1 Givens rotations: 


for k= i:n — 1 
[c, 5] = givens(tkk, tk+1,k) 
m = min{k + 2,n} 

c S 


T 
MM | T (k:k + 1, k:m) 


T (k:k + 1, k:m) = | 


end 


This requires O(n) flops. If the rotations are accumulated, then O(n?) 
flops are needed. 
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8.3.4 Explicit Single Shift QR Iteration 


If s is a good approximate eigenvalue, then we suspect that the (n,n — 1) 
will be small after a QR step with shift s. This is the philosophy behind 
the following iteration: 


T = U AUo (tridiagonal) 


for k = 0,1,... 
Determine real shift j. (8.3.2) 
T-pl = UR (QR factorization) 
T =RU +pI 
end 
If 
à1 bi 0 
bi a 
T x : 
bn-1 
Ü x b4-1 Qn 


then one reasonable choice for the shift is p = an. However, a more effective 
choice is to shift by the eigenvalue of 


T(n — l:n,n — l:n) = | nan mel | 
n—l n 


that is closer to a,. This is known as the Wilkinson shift and it is given 
by 

p = a,4+d-—sign(d),/d? + b*_, (8.3.3) 
where d = (an-ı — a4)/2. Wilkinson (1968b) has shown that (8.3.2) is 


cubically convergent with either shift strategy, but gives heuristic reasons 
why (8.3.3) is preferred. 


8.3.5 Implicit Shift Version 


It is possible to execute the transition from T to T} = RU + pI = UTTU 
without explicitly forming the matrix T — uJ. This has advantages when 
the shift is much larger than some of the a;. Let c = cos(@) and s = sin(@) 
be computed such that 
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If we set G; = G(1,2,8) then Gje; = Ue, and 


x x + 0 0 0 

x x x 0 0 0 

T _ + x x x Q QO 
PEE? NG. Ge. ae ae: Bet 
0 0 0 x x x 

0 0 0 0 x x 


We are thus in a position to apply the Implicit Q theorem provided we can 
compute rotations G5,..., G4; with the property that if Z = G,G2---G,-) 
then Ze; = Gie; = Ue, and Z!TZ is tridiagonal. 

Note that the first column of Z and U are identical provided we take 
each G; to be of the form G; = G(i,i + 1,0,) , à = 2:n — 1. But G; of this 
form can be used to chase the unwanted nonzero element “+” out of the 
matrix G1 T'G, as follows: 


x x 0 0 0 O0 x x 0 00 O0 

x x x + 0 0 x x x 0 0 O0 

Go 0 x x x 0 QO Ga 0 x x x + OQ 
—= 

0 + x x x OQ 0 0 x x x OQ 

0 0 0 x x x 0 0 + Xx x xX 

0 0 0 0 x x 0 0 0 0 x x 

x x 000 0 x x 000 0 

x x x 00 0 x x x 0 0 0 

Gi 0 x x x Q O0 Gs 0 x x x Q 0 
— 

0 0 x x x + 0 0 x x x O0 

0 0 O x x x 0 0 0 x x x 

0 0 0 + x x 0 0 0 0 x x 


Thus, it follows from the Implicit Q theorem that the tridiagonal matrix 
Z?TZ produced by this zero-chasing technique is essentially the same as the 
tridiagonal matrix T' obtained by the explicit method. (We may assume 
that all tridiagonal matrices in question are unreduced for otherwise the 
problem decouples.) 

Note that at any stage of the zero-chasing, there is only one nonzero 
entry outside the tridiagonal band. How this nonzero entry moves down 
the matrix during the update T — GT TG, is illustrated in the following: 


T 


1 00 0 ar bk Zk 0 1 00 0 ak bk 0 0 
0 cs 0 bk ap b OF}90 cs O} | & ap b, 2p 
0-s c 0 zk b, a, b, |] 9 -s e O} | O b, ag b, 
0 00 1 0 0 bg a, 0 00 1 0 z, b, a, 


Here (p,q,r) = (k - 1, k-- 2, k-- 3). This update can be performed in about 
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26 flops once c and s have been determined from the equation bks + zc = 
0. Overall we obtain 


Algorithm 8.3.2 (Implicit Symmetric QR Step with Wilkinson 
Shift) Given an unreduced symmetric tridiagonal matrix T € IR?*", the 
following algorithm overwrites T with ZT TZ, where Z = G,---Gy-_1 is a 
product of Givens rotations with the property that Z7(T — pI) is upper 
triangular and yp is that eigenvalue of T"s trailing 2-by-2 principal submatrix 
closer to tnn. 


d = (ty—in—1 d Ínn)/2 
ji ud ca (a + sign(d),/d? -- t2... , ) 


T =t — H 
z = tz 
for k  1:n- 1 


|c, s| = givens(z, z) 
T= GITG,, where G, = G(k,k + 1,9) 


ifk«n-1 
T = fk+1,k 
Z = Ék4+2,k 
end 


end 


This algorithm requires about 30n flops and n square roots. If a given 
orthogonal matrix Q is overwritten with QG,---G,_1, then an additional 
6n? flops are needed. Of course, in any practical implementation the tridi- 
agonal matrix T would be stored in a pair of n-vectors and not in an n-by-n 
array. 


Example 8.3.2 If Algorithm 8.3.2 is applied to 


1 1 0 0 
1 2 1 0 
Te 0 1 3 0l’ 
0.0 01 4 
then the new tridiagonal matrix T' is given by 
.5000  .5916 0 0 
T .5916 1.785 .1808 0 
0 .1808 3.7140 .0000044 
0 0 .0000044 4.002497 


Algorithm 8.3.2 is the basis of the symmetric QR algorithm—the standard 
means for computing the Schur decomposition of a dense symmetric matrix. 
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Algorithm 8.3.3 (Symmetric QR Algorithm) Given A € IR?*" (sym- 
metric) and a tolerance tol greater than the unit roundoff, this algorithm 
computes an approximate symmetric Schur decomposition Q^ AQ = D. A 
is overwritten with the tridiagonal decomposition. 


Use Algorithm 8.3.1, compute the tridiagonalization 
T = (Py: P423)! A(Pi +++ Pa-2). 
Set D =T and if Q is desired, form Q = Pı --- P453. See 85.1.6. 
until g=n 
For i = l:n — 1, set d;,,,; and d;441 to zero if 
Idus = [diceal S tol(ida| + Idol) 
Find the largest q and the smallest p such that if 


Di 0 0 


then D33 is diagonal and Ds» is unreduced. 
ifq«n 
Apply Algorithm 8.3.2 to Doo: 
D = diag(Ip, Z, Ig)? D diag(Ip, Z, Iq) 
If Q is desired, then Q = Q diag(Ip, Z, In). 
end 
end 


This algorithm requires about 4n?/3 flops if Q is not accumulated and 
about 9n? flops if Q is accumulated. 


Example 8.3.3 Suppose Algorithm 8.3.3 is applied to the tridiagonal matrix 


0 0 

4 0 
A= 5 6 
6 T 


1 2 
2 3 
0 4 
0 0 


The subdiagonal entries change as follows during the execution of Algorithm 8.3.3: 


Iteration a21 32 043 
1 1.6817 3.2344 .8649 
2 1.6142 2.5755 .0006 
3 1.6245 1.6965 10-13 
4 1.6245 1.6965 converg. 
5 1.5117 .0150 
6 1.1195 107? 
1 T7071 converg. 
8 converg. 
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Upon completion we find A(A) = {—2.4848, .7046, 4.9366, 12.831). 


The computed eigenvalues 4; obtained via Algorithm 8.3.3 are the exact 
eigenvalue of a matrix that is near to A, i.e., QT (A + E)Qo = diag(À,) 
where Q4 Qo = I and | E l2 = ull A ||. Using Corollary 8.1.6 we know that 
the absolute error in each À; is small in the sense that |À; — à;| ^: ull A |o. 
If Ô = [d1.---,4n] is the computed matrix of orthonormal eigenvectors, 
then the accuracy of à; depends on the separation of À; from the remainder 
of the spectrum. See Theorem 8.1.12. 

If all of the eigenvalues and a few of the eigenvectors are desired, then 
it is cheaper not to accumulate Q in Algorithm 8.3.3. Instead, the desired 
eigenvectors can be found via inverse iteration with T. See 58.2.2. Usually 
just one step is sufficient to get a good eigenvector, even with a random 
initial vector. 

If just a few eigenvalues and eigenvectors are required, then the special 
techniques in $8.5 are appropriate. 

It is interesting to note the connection between Rayleigh quotient it- 
eration and the symmetric QR algorithm. Suppose we apply the latter 
to the tridiagonal matrix T c IR^** with shift c = elTe, = tnn where 
en = 1,(:,n). If T - cI- QR, then we obtain T = RQ+ cI. From the 
equation (T — cI)Q = RT it follows that 


(T = c I)g, = Tnnn 
where qn is the last column of the orthogonal matrix Q. Thus, if we apply 
(8.2.6) with rg = en, then 2; = gp. 
8.3.6 Orthogonal Iteration with Ritz Acceleration 


Recall from §8.2.4 that an orthogonal iteration step involves a matrix- 
matrix product and a QR factorization: 


Zk = AQk-ı 
QkRk = Zk (QR factorization) 


Theorem 8.1.14 says that we can minimize || AQ, — QS p by setting 5 = 
S2 Qt AQ. If UF S,U, = Dy is the Schur decomposition of S, € IR^ 
and Qk = Q&,U,, then 


| AQx — Qk Dr Ip = || Ar — Ve Se [lg 


showing that the columns of Qx are the best possible basis to take after k 
steps from the standpoint of minimizing the residual. This defines the Ritz 
acceleration idea: 
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Qo € IR?*? given with QF Qo = I, 


for k = 1,2,... 
Zk = AQk-1ı 
QkRk = Zk (QR factorization) 
Sk = Qk AQk (8.3.6) 
Uk SkUk = Dy (Schur decomposition) 
Qk = QkUk 
end 


It can be shown that if 
D, = diag(019,...,0(9) PIS- > (80 


then " 
Ar+1 


TU pl 


f? — (A) = o( 


Recall that Theorem 8.2.2 says the eigenvalues of QT AQ, converge with 
rate \Avt1/A-|*- Thus, the Ritz values converge at a more favorable rate. 
For details, see Stewart (1969). 


Example 8.3.4 If we apply (8.3.6) with 


100 1 1 1 1 0 
1 99 11 0 1 
A= 1 1 2 1 and Qo— | 9 9 
l ^ Xu 0 0 


then 


k — dist(D2(A), Qk} 
0 2x10-1 
1 5x 1073 
2 .1 x 1074 
3 3x 107-6 
4 .8 x 10-8 


Clearly, convergence is taking place at the rate (2/99)*. 


Problems 


P8.3.1 Suppose A is an eigenvalue of a symmetric tridiagonal matrix T. Show that if 
À has algebraic multiplicity k, then at least k — 1 of T's subdiagonal elements are zero. 


P8.3.2 Suppose A is symmetric and has bandwidth p. Show that if we perform the 
shifted QR step A — ul = QR, A = RQ + ul, then A has bandwidth p. 


P8.3.3 Suppose B € R"*" is upper bidiegonal with diagonal entries d(1:n) and super- 
diagonal entries f(1:n — 1). State and prove a singular value version of Theorem 8.3.1. 


P8.3.4 Let A= | 7 i | be real and suppose we perform the following shifted QR 


step: A— zl = UR, A= RU + zl. Show that if A= | 


t & 
Ni BI 


| then 


424 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM 


w+ z?(w — z)/[(w — z)? + 2?) 
z — z?(w — z)/[(w — 2)? + z?] 


= 
i —23/[(w — z)? + z?]. 


eo y 


P8.3.5 Suppose A € (^*^ is Hermitian. Show how to construct unitary Q such that 
QF AQ = T is real, symmetric, and tridiagonal. 
P8.3.6 Show that if A= B + iC is Hermitian, then M = l T | is baime: 


Relate the eigenvalues and eigenvectors of A and M. 


P8.3.7 Rewrite Algorithm 8.2.2 for the case when A is stored in two n-vectors. Justify 
the given flop count. 


P8.3.8 Suppose A = S+ouu™ where S € R?*" is skew-symmetric (AT = —A, u € R^ 
has unit 2-norm, and g € R. Show how to compute an orthogonal Q such that QT AQ 
is tridiagonal and QT u = In(:, 1) = ei. 


Notes and References for Sec. 8.3 


The tridiagonalization of a symmetric matrix is discussed in 
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ACM 17, 20-24. 
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H. Bowdler, R.S. Martin, C. Reinsch, and J.H. Wilkinson (1968). “The QR and QL 
Algorithms for Symmetric Matrices," Numer. Math. 11, 293-306. See also Wilkinson 
and Reinsch (1971, pp.227-40). 

A. Dubrulle, R.S. Martin, and J.H. Wilkinson (1968). “The Implicit QL Algorithm," 
Numer. Math. 12, 377-83. see also Wilkinson and Reinsch (1971, pp.241-48). 
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G.W. Stewart (1970). “Incorporating Original Shifts into the QR Algorithm for Sym- 
metric Tridiagonal Matrices,” Comm. ACM 13, 365-67. 

A. Dubrulle (1970). “A Short Note on the Implicit QL Algorithm for Symmetric Tridi- 
agonal Matrices," Numer. Math. 15, 450. 


Extensions to Hermitian and skew-symmetric matrices are described in 


D. Mueller (1966). *Householder's Method for Complex Matrices and Hermitian Matri- 
ces,” Numer. Math. 8, 72-92. 

R.C. Ward and L.J. Gray (1978). “Eigensystem Computation for Skew-Symmetric and 
A Class of Symmetric Matrices,” ACM Trans. Math. Soft. 4, 278-85. 
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The convergence properties of Algorithm 8.2.3 are detailed in Lawson and Hanson (1974, 
Appendix B), as well as in 


J.H. Wilkinson (1968b). “Global Convergence of Tridiagonal QR Algorithm With Origin 
Shifts,” Lin. Aig. and Its Applic. I, 409-20. 

T.J. Dekker and J.F. Traub (1971). “The Shifted QR Algorithm for Hermitian Matrices,” 
Lin. Alg. and Its Applic. 4, 137-54. 

W. Hoffman and B.N. Parlett (1978). “A New Proof of Global Convergence for the 
Tridiagonal QL Algorithm,” SIAM J. Num. Anal. 15, 929-37, 

S. Batterson (1994). “Convergence of the Francis Shifted QR Algorithm on Normal 
Matrices," Lin. Aig. and Its Applic. 207, 181-195. 


For an analysis of the method when it is applied to normal matrices see 


C.P. Huang (1981). “On the Convergence of the QR Algorithm with Origin Shifts for 
Normal Matrices,” IMA J. Num. Anal. 1, 127-33. 


Interesting papers concerned with shifting in the tridiagonal QR algorithm include 


F.L. Bauer and C. Reinsch (1968). “Rational QR Transformations with Newton Shift 
for Symmetric Tridiagonal Matrices,” Numer. Math. 11, 264-72. See also Wilkinson 
and Reinsch (1971, pp.257-65). 

G.W. Stewart (1970). “Incorporating Origin Shifts into the QR Algorithm for Symmetric 
Tridiagonal Matrices,” Comm. Assoc. Comp. Mach. 13, 365-67. 


Some parallel computation possibilities for the algorithms in this section are discussed in 


S. Lo, B. Philippe, and A. Sameh (1987). “A Multiprocessor Algorithm for the Symmet- 
ric Tridiagonal Eigenvalue Problem,” SIAM J. Sci. and Stat. Comp. 8, 5155-8165. 

H.Y. Chang and M. Salama (1988). “A Parallel Householder Tridiagonalization Strategy 
Using Scattered Square Decomposition,” Parallel Computing 6, 297-312. 


Another way to compute a specified subset of eigenvalues is via the rational QR algo- 
rithm. In this method, the shift is determined using Newton’s method. This makes it 
possible to “steer” the iteration towards desired eigenvalues. See 


C. Reinsch and F.L. Bauer (1968). “Rational QR Transformation with Newton's Shift 
for Symmetric Tridiagonal Matrices,” Numer. Math. 11, 264-72. See also Wilkinson 
and Reinsch (1971, pp.257-65). 


Papers concerned with the symmetric QR algorithm for banded matrices include 


R.S. Martin and J.H. Wilkinson (1967). “Solution of Symmetric and Unsymmetric Band 
Equations and the Calculation of Eigenvectors of Band Matrices,” Numer. Math. 9, 
279-301. See also See also Wilkinson and Reinsch (1971, pp.70-92). 

R.S. Martin, C. Reinsch, and J.H. Wilkinson (1970). “The QR Algorithm for Band 
Symmetric Matrices,” Numer. Math. 16, 85-92. See also Wilkinson and Reinsch 
(1971, pp.266-72). 
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8.4 Jacobi Methods 


Jacobi methods for the symmetric eigenvalue problem attract current at- 
tention because they are inherently parallel. They work by performing a 
sequence of orthogonal similarity updates A — QT AQ with the property 
that each new A, although full, is “more diagonal” than its predecessor. 
Eventually, the off-diagonal entries are small enough to be declared zero. 

After surveying the basic ideas behind the Jacobi approach we develop 
a parallel Jacobi procedure. 


8.4.1 The Jacobi Idea 


The idea behind Jacobi’s method is to systematically reduce the quantity 


i.e., the“norm” of the off-diagonal elements. The tools for doing this are 
rotations of the form 


1 0 0 0 
0 C S 0 p 
J(p,q,0) = : 
0 —8 c 0 q 
O Siu Y owes db ox c 
p q 


which we call Jacobi rotations. Jacobi rotations are no different from Givens 
rotations, c.f. 85.1.8. We submit to the name change in this section to honor 
the inventor. 

The basic step in a Jacobi eigenvalue procedure involves (1) choosing an 
index pair (p,q) that satisfies 1 < p < q € n, (2) computing a cosine-sine 
pair (c, s) such that 


T 
bpp bpa | _ C S App lpg Cc S (8.4.1) 
bap bqa FTU rs Qqp Faq ums n iB 
is diagonal, and (3) overwriting A with B — JT AJ where J = J (D, q, 0). 
Observe that the matrix B agrees with A except in rows and columns p 
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and q. Moreover, since the Frobenius norm is preserved by orthogonal 
transformations we find that 


až, a2, 4 2a2, = bl +b, +262, = i8. 


and so 


of(B) = ||Bip- > b} (8.4.2) 
i=l 


T 
2 2 
l| A Ile -$ a; + (a2, +02, — 2, — 62.) 
i=] 


= off(A)? — 202, - 

It is in this sense that A moves closer to diagonal form with each Jacobi 
step. 

Before we discuss how the index pair (p,q) can be chosen, let us look at 
the actual computations associated with the (p,q) subproblem. 
8.4.2 The 2-by-2 Symmetric Schur Decomposition 
To say that we diagonalize in (8.4.1) is to say that 

O = by, = apl? — s?) + (app — aqq)cs. (8.4.3) 


If a, = 0, then we just set (c, s) = (1,0) . Otherwise define 


ügq — a 
T = —5— 7? and t = s/c 
2 pq 


and conclude from (8.4.3) that t = tan(@) solves the quadratic 
t? +2rt-1=0. 
It turns out to be important to select the smaller of the two roots, 
t= -TŁ VI+? 
whereupon c and s can be resolved from the formulae 
c=1/ J1+2 s=te. 


Choosing t to be the smaller of the two roots ensures that |0| € 1/4 and 
has the effect of minimizing the difference between B and A because 


n 
|B- A i" = 4(1-c) »» (a2, t az.) + 20 es 
i=] 
itp.a 
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We summarize the 2-by-2 computations as follows: 


Algorithm 8.4.1 Given an n-by-n symmetric A and integers p and q that 
satisfy 1 < p < q < n, this algorithm computes a cosine-sine pair (c, s) 
such that if B = J(p,q, 0)? AJ (p,q, 0) then bpa = bgp = O. 


function: [c, s| = sym.schur2(A, p, q) 


if A(p,q) £0 
T = (A(q,q) — A(p,p))/(2A(p. q)) 
if7>0 
t — Y(r 4 V14 72); 
else 
t2 -l1/(-r-- 1-4 r2) 
end 
c=1/Vv1+# 
s=te 
else 
c=1 
s=0 
end 


8.4.3 The Classical Jacobi Algorithm 


As we mentioned above, only rows and columns p and q are altered when 
the (p,q) subproblem is solved. Once sym.schur2 determines the 2-by-2 
rotation, then the update A + J(p,q,0)? AJ(p,q,0) can be implemented 
in 6n flops if symmetry is exploited. 

How do we choose the indices p and q? From the standpoint of maxi- 
mizing the reduction of off( A) in (8.4.2), it makes sense to choose (p,q) so 
that s is maximal. This is the basis of the classical Jacobi algorithm. 


Algorithm 8.4.2 (Classical Jacobi) Given a symmetric A € IR**^ and 
a tolerance tol > 0, this algorithm overwrites A with VT AV where V is 
orthogonal and off(V7 AV) < tol|| A || p. 


V = In; eps =tol|| A || p 
while off(A) > eps 
Choose (p, q) so lap,| = maxiz; |ai;}. 
(c, s) = sym.schur2(A, p,q) 
A= J(p, q, 8)! AJ (p, q, 0) 
V — VJ(p,q,6) 
end 


Since |apq| is the largest off-diagonal entry, off(A)* < N(a2, + a2) where 
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N = n(n — 1)/2. From (8.4.2) it follows that 
off(B)? < (1 " x) off AY? . 
N 
By induction, if A9 denotes the matrix A after k Jacobi updates, then 
TG 
off( A“)? < (1 - x) off( A y? 


This implies that the classical Jacobi procedure converges at a linear rate. 

However, the asymptotic convergence rate of the method is considerably 
better than linear. Schonhage (1964) and van Kempen (1966) show that 
for k large enough, there is a constant c such that 


of(A**N)) < c. off( AM)? 


i.e., quadratic convergence. An earlier paper by Henrici (1958) established 
the same result for the special case when A has distinct eigenvalues. In 
the convergence theory for the Jacobi iteration, it is critical that |@| < 7/4. 
Among other things this precludes the possibility of “interchanging” nearly 
converged diagonal entries. This follows from the formulae bpp = app — tagg 
and b, = agg + tapg, which can be derived from equations (8.4.1) and the 
definition t = sin(@)/ cos(@). 

It is customary to refer to N Jacobi updates as a sweep. Thus, after 
a sufficient number of iterations, quadratic convergence is observed when 
examining off(A) after every sweep. 


Example 8.4.1 Applying the classical Jacobi iteration to 


1111 
12 3 4 
A=1/13 6 Ww 
1 4 10 20 


we find 


There is no rigorous theory that enables one to predict the number of 
Sweeps that are required to achieve a specified reduction in off( A). However, 
Brent and Luk (1985) have argued heuristically that the number of sweeps 
is proportional to log(n) and this seems to be the case in practice. 
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8.4.4 The Cyclic-by-Row Algorithm 


The trouble with the classical Jacobi method is that the updates involve 
O(n) flops while the search for the optimal (p,q) is O(n”). One way to 
address this imbalance is to fix the sequence of subproblems to be solved 
in advance. A reasonable possibility is to step through all the subproblems 
in row-by-row fashion. For example, if n = 4 we cycle as follows: 


(p,q) = (1,2), (1,3), (1, 4), (2, 3), (2, 4), (3, 4), (1, 2),--. 


This ordering scheme is referred to as cyclic-by-row and it results in the 
following procedure: 


Algorithm 8.4.3 (Cyclic Jacobi) Given a symmetric A € IR"*" and 
a tolerance tol > 0, this algorithm overwrites A with V7 AV where V is 
orthogonal and off(V* AV) < toll] A ||; . 


v= i, 
eps = tol|| A || p 
while off(A) > eps 
for p=1:n-1 
for g=pt+1:n 
(c , 8) S sym.schur2(A, P, q) 
A= J(p,q, 0)T AJ (p, q, 8) 
V — V J(p, q, 6) 
end 
end 
end 


Cyclic Jacobi converges also quadratically. (See Wilkinson (1962) and van 
Kempen (1966).) However, since it does not require off-diagonal search, it 
is considerably faster than Jacobi's original algorithm. 


Example 8.4.2 Ifthe cyclic Jacobi method is applied to the matrix in Example 8.4.1 
we find 


O(oft( A) 
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8.4.5 Error Analysis 


Using Wilkinson's error analysis it is possible to show that if r sweeps are 
needed in Algorithm 8.4.3 then the computed d; satisfy 


n 


X (di -A) < (8+ k)| Alpu 


i=l 


for some ordering of A's eigenvalues À;. The parameter k, depends mildly 
on T. 

Although the cyclic Jacobi method converges quadratically, it is not 
generally competitive with the symmetric QR algorithm. For example, if 
we just count flops, then 2 sweeps of Jacobi is roughly equivalent to a com- 
plete QR reduction to diagonal form with accumulation of transformations. 
However, for small n this liability is not very dramatic. Moreover, if an ap- 
proximate eigenvector matrix V is known, then V? AV is almost diagonal, 
a situation that Jacobi can exploit but not QR. 

Another interesting feature of the Jacobi method is that it can a com- 
pute the eigenvalues with small relative error if A is positive definite. To 
appreciate this point, note that the Wilkinson analysis cited above cou- 
pled the 88.1 perturbation theory ensures that the computed eigenvalues 
Al 2 ++. > Ay Satisfy 

[Ai - ACA) cz al A llo « uk2(A). 
Ai(A) ACA) 
However, a refined, componentwise error analysis by Demmel and Veselit 
(1992) shows that in the positive definite case, 


lÀ; — ACA) 
A(A) 


where D = diag(,/@i1,...,./@nn) and this is generally a much smaller ap- 
proximating bound. The key to establishing this result is some new pertur- 
bation theory and a demonstration that if A, is a computed Jacobi update 
obtained from the current matrix A., then the eigenvalues of A, are rel- 
atively close to the eigenvalues of A, in the sense of (8.4.4). To make the 
whole thing work in practice, the termination criteria is not based upon 
the comparison of off( A) with ul A ||, but rather on the size of each |ai;| 
compared to u,/2;;a5;. This work is typical of a new genre of research con- 
cerned with high-accuracy algorithms based upon careful, componentwise 
error analysis. See Mathias (1995). 


= uk (DIAD 1). (8.4.4) 


8.4.6 Parallel Jacobi 


Perhaps the most interesting distinction between the QR and Jacobi ap- 
Proaches to the symmetric eigenvalue problem is the rich inherent paral- 
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lelism of the latter algorithm. To illustrate this, suppose n = 4 and group 
the six subproblems into three rotation sets as follows: 


rot.set(1) = , (3,4 
rot.set(2) = {(1,3), (2,4 
rot.set(3) = {(1,4), (2,3 


| 
m 
~~ 
n 

~~ 
bo 
—" 


Note that all the rotations within each of the three rotation sets are “non- 
conflicting." 'That is, subproblems (1,2) and (3,4) can be carried out in 
parallel. Likewise the (1,3) and (2,4) subproblems can be executed in par- 
allel as can subproblems (1,4) and (2,3). In general, we say that 


(i1, j1), (22, 32)... (Ev jN) N= (n — 1)n/2 


is a parallel ordering of the set ((2,7) || € i « j € n} if for s = lin-1 
the rotation set rot.set(s) = { (i-,j-):7r — 1-- n(s — 1)/2:ns/2 ) consists 
of nonconflicting rotations. This requires n to be even, which we assume 
throughout this section. (The odd n case can be handled by bordering 
A with a row and column of zeros and being careful when solving the 
subproblems that involve these augmented zeros.) 

A good way to generate a parallel ordering is to visualize a chess tourna- 
ment with n players in which everybody must play everybody else exactly 
once. In the n = 8 case this entails 7 “rounds.” During round one we have 
the following four games: 


rot.set(1) — ((1,2),(3,4),(5,6),(7,8) ) 


i.e., 1 plays 2, 3 plays 4, etc. To set up rounds 2 through 7, player 1 stays 
put and players 2 through 8 embark on a merry-go-round: 


BHHE rot.set(2) = 1((1,4),(2,6), (3, 8), (5, 7)) 
apatite rot.set(3) = {(1,6), (4,8), (2, 7), (3,5)} 
HEINE rot.set(4) = {(1,8), (6,7), (4,5), (2,3) 
rot.set(5) = {(1,7), (5,8), (3, 6), (2, 4)) 
attit rot.set(6) = {(1,5), (3,7), (2,8), (4,6)} 
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dive rot.set(T) = ((1, 3), (2, 5), (4, 7), (6, 8)} 


We can encode these operations in a pair of integer vectors top(1:n/2) and 
bot(1:n/2). During a given round top(k) plays bot(k) , k = 1:n/2. The 
pairings for the next round is obtained by updating top and bot as follows: 


function: [new.top, new.bot| = music(top, bot, n) 


m = n/2 
for k = 1:m 
ifk=1 
new.top(1) = 1 
else if k = 2 
new.top(k) = bot(1) 
elseif k > 2 
new.top(k) = top(k — 1) 
end 
if k — m 
new.bot(k) — top(k) 
else 
new.bot(k) = bot(k + 1) 
end 
end 


Using music we obtain the following parallel order Jacobi procedure. 


Algorithm 8.4.4 (Parallel Order Jacobi) Given asymmetric A € I" 
and a tolerance tal > 0, this algorithm overwrites A with V7 AV where V 
is orthogonal and off(V" AV) < tol|| A || . It is assumed that n is even. 


V= 
eps = toll A |lp 
top = 1:2:n; bot = 2:2:n 
while off(A) > eps 
for set = lin -—1 
for k = 1:n/2 
p = min(top(k), bot(k)) 
q = max(top(k), bot(k)) 
(c, s)= sy m.schur2(A, p, q) 
A= J(p,q, 0)" AJ(p, 4,9) 
V — VJ(p.q,9) 
end 
[top, bot] = music(top, bot, n) 
end 
end 
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Notice that the k-loop steps through n/2 independent, nonconflicting sub- 
problems. 


8.4.7 <A Ring Procedure 


We now discuss how Algorithm 8.4.4 could be implemented on a ring of p 
processors. We assume that p = n/2 for clarity. At any instant, Proc(4) 
houses two columns of A and the corresponding V columns. For example, 
if n = 8 then here is how the column distribution of A proceeds from step 
to step: 


Proc(1) Proc(2) Proc(3) Proc(4) 


Step 1: [12] [34] [56] [78] 

Step 2: [14] [26] [38] [57] 

Step3: [16] [48] [21 [35] 
etc. 


The ordered pairs denote the indices of the housed columns. The first index 
names the left column and the second index names the right column. Thus, 
the left and right columns in Proc(3) during step 3 are 2 and 7 respectively. 

Note that in between steps, the columns are shuffled according to the 
permutation implicit in music and that nearest neighbor communication 
prevails. At each step, each processor oversees a single subproblem. This 
involves (a) computing an orthogonal Vamat € IR2*? that solves a local 2- 
by-2 Schur problem, (b) using the 2-by-2 V,,,,1; to update the two housed 
columns of A and V, (c) sending the 2-by-2 Vma to all the other proces- 
sors, and (d) receiving the Vma matrices from the other processors and 
updating the local portions of A and V accordingly. Since A is stored by 
column, communication is necessary to carry out the Vsmeu updates be- 
cause they effect rows of A. For example, in the second step of the n — 8 
problem, Proc(2) must receive the 2-by-2 rotations associated with sub- 
problems (1,4), (3,8), and (5,7). These come from Proc(1), Proc(3), and 
Proc(4) respectively. In general, the sharing of the rotation matrices can 
be conveniently implemented by circulating the 2-by-2 Vimo matrices in 
*merry go round" fashion around the ring. Each processor copies a pass- 
ing 2-by-2 Vma into its local memory and then appropriately updates the 
locally housed portions of A and V. 

The termination criteria in Algorithm 8.4.4 poses something of a prob- 
lem in a distributed memory environment in that the value of off(-) and 
|| A ||p require access to all of A. However, these global quantities can be 
computed during the V matrix merry-go-round phase. Before the circu- 
lation of the V's begins, each processor can compute its contribution to 
| A ||p and off(-) . These quantities can then be summed by each processor 
if they are placed on the merry-go-round and read at each stop. By the 
end of one revolution each processor has its own copy of || A || and off(-). 
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8.4.8 | Block Jacobi Procedures 


It is usually the case when solving the symmetric eigenvalue problem on a 
p-processor machine that n 2» p. In this case a block version of the Jacobi 
algorithm may be appropriate. Block versions of the above procedures are 
straightforward. Suppose that n = rN and that we partition the n-by-n 
matrix A as follows: 


Ay, ce Ain 
A= : : 
Ani ° ANN 


Here, each Aj; is r-by-r. In block Jacobi the (p,q) subproblem involves 
computing the 2r-by-2r Schur decomposition 


LE Vos 4i EN ZI ^ ER s E 
Agp Aaa Vip Vag O Doa 


and then applying to A the block Jacobi rotation made up of the Vi; . If 
we call this block rotation V then it is easy to show that 


ofi(VT AV)? = ofA)? — (2I Apa N} + off( App)? + off 4,,)?) - 


Block Jacobi procedures have many interesting computational aspects. For 
example, there are many ways to solve the subproblems and the choice 
appears to be critical, See Bischof (1987). 


Problems 


P8.4.1 Let the scalar y be given along with the matrix 
A= | es: | 
z z 


It is desired to compute an orthogonal matrix 


j= | C s | 
-s c 
such that the (1, 1) entry of JT AJ equals y. Show that this requirement leads to the 
equation 
(w — y)r? — 2x + (z — y) — 6, 
where r = c/s. Verify that this quadratic has real roots if y satisfies A2 < y < X1, where 
Ai and àz are the eigenvalues of A. 


P8.4.2 Let A € R**" be symmetric. Give an algorithm that computes the factorization 
QTAQ = yI +F 


where Q is a product of Jacobi rotations, y = trace(A)/n, and F has zero diagonal 
entries. Discuss the uniqueness of Q. 


P8.4.3 Formulate Jacobi procedures for (a) skew symmetric matrices and (b) complex 
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Hermitian matrices. 
P8.4.4 Partition the n-by-n real symmetric matrix A as follows: 
T 
a v 1 
A= | v À1 | n—l 
1n—1 


Let Q be a Householder matrix such that if B = QT AQ, then B(3:n,1) = 0. Let 
J = J(1,2,0) be determined such that if C = JT BJ, then c12 = 0 and ci1 > ce22. Show 
c11 2 at l| v ||a. La Budde (1964) formulated an algorithm for the symmetric eigenvalue 
probem based upon repetition of this Householder-Jacobi computation. 


P8.4.5 Organize function music so that it involves minimum workspace. 


P8.4.6 When implementing cyclic Jacobi, it is sensible to skip the annihilation of ap, 
if its modulus is less than some small, sweep-dependent parameter, because the net re- 
duction in off(A) is not worth the cost. This leads to what is called the threshold Jacobi 
method. Details concerning this variant of Jacobi's algorithm may be found in Wilkinson 
(1965, p.277). Show that appropriate thresholding can guarantee convergence. 


Notes and References for Sec. 8.4 


Jacobi's original paper is one of the earliest references found in the numerical analysis 
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8.5 Tridiagonal Methods 


In this section we develop special methods for the symmetric tridiagonal 
eigenproblem. The tridiagonal form 


al bi wae 0 
bi a3 . : 
T= S a (8.5.1) 
bs i 
Q ox bn—1 Gn 


can be obtained by Householder reduction (cf. §8.3.1). However, symmetric 
tridiagonal eigenproblems arise naturally in many settings. 

We first discuss bisection methods that are of interest when selected 
portions of the eigensystem are required. This is followed by the presen- 
tation of a divide and conquer algorithm that can be used to acquire the 
full symmetric Schur decomposition in a way that is amenable to parallel 
processing. 


8.5.1 Eigenvalues by Bisection 


Let T, denote the leading r-by-r principal submatrix of the matrix T in 
(8.5.1). Define the polynomials p(x) = det(T, — xI), r = i:n. A simple 
determinantal expansion shows that 


pr(z) = (a, —2£)p,_1(r) — b?_,p,—2(z) (8.5.2) 


for r = 2:n if we set po(r) = 1. Because p,(r) can be evaluated in O(n) 
flops, it is feasible to find its roots using the method of bisection. For 
example, if pa(y)pn(z) < 0 and y < z, then the iteration 


while [y — z| > e(ly| + |z]) 
z = (y + z)/2 
if pa(£)Pn(y) < 0 
z=r 
else 


end 


is guaranteed to terminate with (y + z)/2 an approximate zero of p,(z), 
i.e., an approximate eigenvalue of T. The iteration converges linearly in 
that the error is approximately halved at each step. 
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8.5.2 Sturm Sequence Methods 


Sometimes it is necessary to compute the kth largest eigenvalue of T' for 
some prescribed value of k. This can be done efficiently by using the bisec- 
tion idea and the following classical result: 


Theorem 8.5.1 (Sturm Sequence Property) If the tridiagonal matriz 
in (8.5.1) has no zero subdiagonal entries, then the eigenvalues of T,.., 
strictly separate the eigenvalues of T,: 


Ar(T;) < Ar—1(Ty-1) < Ar-1(T;) Lre Ao(T;) < A1(11-1) < Ai(T;). 
Moreover, if a(A) denotes the number of sign changes in the sequence 


{ po(A), m(A) iL Pn(A) } 


then a(A) equals the number of T's eigenvalues that are less than A. Here, 
the polynomials p,(x) are defined by (8.5.2) and we have the convention 
that p,(A) has the opposite sign of p. 1(A) if p. (4) = 0. 


Proof. It follows from Theorem 8.1.7 that the eigenvalues of T,. , weakly 
separate those of T,. To prove that the separation must be strict, suppose 
that p.(u) = pr. i1(u) = 0 for some r and p. It then follows from (8.5.2) 
and the assumption that T is unreduced that po(u) = m(p) = --- = p (u) 
= 0, a contradiction. Thus, we must have strict separation. 

The assertion about a(A) is established in Wilkinson (1965, 300-301). 
We mention that if p,(A) = 0, then its sign is assumed to be opposite the 
sign of p..1(A). O 


Example 8.5.1 If 


T = 
0 0 -1 4 
then A(T) = (.254, 1.82, 3.18, 4.74). The sequence 
{ po{2}, pi (2), p2(2), p3(2), pa(2) } E 1 1, -1, =I, 0, 1 } 


confirms that there are two eigenvalues less than A = 2. 
Suppose we wish to compute A;,(T). From the Gershgorin theorem 


(Theorem 8.1.3) it follows that A&(T) € [y, z] where 


y= min a, — |bi| — |b; .1| z= max aj-l|bi|- |bi 1| 
1<i<n I<ign 


if we define bọ = b, = 0. With these starting values, it is clear from the 
Sturm sequence property that the iteration 
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while |z — y| > u(lyj + |z|) 
z = (y + z)/2 
if a(x) 2n—k (8.5.3) 
2c 
else 


end 


produces a sequence of subintervals that are repeatedly halved in length 
but which always contain X(T). 


Example 8.5.2 If (8.5.3) is applied to the matrix of Example 8.5.1 with k — 3, then 
the values shown in the following table are generated: 


y z T a(x) 


0.0000 5.0000 2.5000 2 

0.0000 2.5000 1.2500 
1.2500 2.5000 1.3750 
1.3750 2.5000 1.9375 
1.3750 1.9375 1.6563 
1.6563 1.9375 1.7969 


mom AD m om 


We conclude from the output that A3(T) € [ 1.7969, 1.9375]. Note: A3(T) = 1.82. 


During the execution of (8.5.3), information about the location of other 
eigenvalues is obtained. By systematically keeping track of this informa- 
tion it is possible to devise an efficient scheme for computing "contiguous" 
subsets of A(T), e.g., A&(T), Aka1(T),..., Ak 45(T). See Barth, Martin, and 
Wilkinson (1967). 

If selected eigenvalues of a general symmetric matrix A are desired, 
then it is necessary first to compute the tridiagonalization T = Ud TU 
before the above bisection schemes can be applied. This can be done using 
Algorithm 8.3.1 or by the Lanczos algorithm discussed in the next chapter. 
In either case, the corresponding eigenvectors can be readily found via 
inverse iteration since tridiagonal systems can be solved in O(n) flops. See 
§4.3.6 and §8.2.2. 

In those applications where the original matrix A already has tridiagonal] 
form, bisection computes eigenvalues with small relative error, regardless of 
their magnitude. This is in contrast to the tridiagonal QR iteration, where 
the computed eigenvalues À; can be guaranteed only to have small absolute 
error: |A; — A,(T)| z ull T || 

Finally, it is possible to compute specific eigenvalues of a symmetric ma- 
trix by using the LDL’ factorization (see 84.2) and exploiting the Sylvester 
inertia theorem (Theorem 8.1.17). If 


A-pl = LDL?’ A= AT eR" 
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is the LDL’ factorization of A — uI with D = diag(d),...,d,), then the 
number of negative d; equals the number of 4;(A) that are less than u. See 
Parlett (1980, p.46) for details. 


8.5.8 Eigensystems of Diagonal Plus Rank-1 Matrices 


Our next method for the symmetric tridiagonal eigenproblem requires that 
we be able to compute efficiently the eigenvalues and eigenvectors of a 
matrix of the form D + pzz? where D e R”™"is diagonal, z € IR^, and 
p € R. This problem is important in its own right and the key computations 
rest upon the following pair of results. 


Lemma 8.5.2 Suppose D = diag(d;,...,d,) € IR" has the property that 
dj >- > da . Assume that p xx 0 and that z € R” has no zero compo- 
nents. If 


(D+pzz7)u = w v0 


then zT v Xx 0 and D — M is nonsingular. 
Proof. If à € A(D) , then A = d; for some 7 and thus 
0 =e; ((D — ADw + p(z!3v)z] = p(z*v)z,. 


Since p and z; are nonzero we must have 0 = 27 v and so Dv = Av. How- 
ever, D has distinct eigenvalues and therefore, v € span{e;}. But then 
0 = zTv = z,, a contradiction. Thus, D and D + pzz? do not have any 
common eigenvalues and 27 v # 0. O 


Theorem 8.5.3 Suppose D = diag(di,...,d4) € IR^*" and that the diag- 
onal entries satisfy dı > --- > dn. Assume that p Æ 0 and that z € R” has 
no zero components. If V € IR *" is orthogonal such that 


V^(D-pzz')V = diag(Ay,...,An) 
with Àj > +- >, and V =[,...,Un], then 
(a) The A, are the n zeros of KA) =1+ pz? (D — AI)-!z. 


(b) If p» 0, then Ay > di > 22» ++: > dn. 
If p <0, then dy >A, > dg > +++ > dn > Aq. 


(c) The eigenvector vi is a multiple of (D — 41)~1z. 
Proof. If (D + pzz?)u = Av, then 


(D — AI)v + p(zTv)z — 0. (8.5.4) 
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We know from Lemma 8.5.2 that D ~ AI is nonsingular. Thus, 
v € span((D — AI) !z) 


thereby establishing (c). Moreover, if we apply z7 (D — AI) ^! to both sides 
of equation (8.5.4) we obtain 


zv (1+ pz! (D — AD)^1z) = 0. 


By Lemma 8.5.2, zTv # 0 and so this shows that if à € A(D + pzz? ), then 
f(A) = 0. We must show that all the zeros of f are eigenvalues of D + pzz? 
and that the interlacing relations (b) hold. 

To do this we look more carefully at the equations 


TM (us terre E 
Pldi-A d.—À 


| 4 d 
ro = ru tta s) 


Note that f is monotone in between its poles. This allows us to conclude 
that if p > 0, then f has precisely n roots, one in each of the intervals 


(dn, dy.1), tty (d2, dı), (di, oo). 
If p < 0 then f has exactly n roots, one in each of the intervals 
(—oo, dn), (dn, d, 1), e035 (do, di). 


In either case, it follows that the zeros of f are precisely the eigenvalues of 
D+ pw! .a 


f(A) 


The theorem suggests that to compute V we (a) find the roots A1,...,An 
of f using a Newton-like procedure and then (b) compute the columns of 
V by normalizing the vectors (D — A;I)^!z for i = 1:n. The same plan of 
attack can be followed even if there are repeated d; and zero z;. 


Theorem 8.5.4 If D = diag(di,...,d,) and z € IR", then there exists an 
orthogonal matriz V, such that if VL DV, = diag(u,..., fn) and w = 
Vi z then 

My > fe > > Hr 2 Ug 2+ 2 Un, 


w; #0 for i= Lr, and w; =0 fori=rt lin. 


Proof. We give a constructive proof based upon two elementary opera- 
tions. (a) Suppose d; = d; for some i < j . Let J(i,j,@) be a Jacobi 
rotation in the (27,7) plane with the property that the jth component of 
J(i,j,0)Tz is zero. It is not hard to show that J(i,j,8)T DJ(i, j,80) = D. 
Thus, we can zero a component of z if there is a repeated d;. (b) If z; = 0, 
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2; #0, and i < j, then let P be the identity with columns ? and j inter- 
changed. It follows that PT DP is diagonal, (P7z); # 0, and (PT z); = 
'Thus, we can permute all the zero z; to the *bottom." Clearly, repetition 
of (a) and (b) eventually renders the desired canonical structure. Vj is the 
product of the rotations. O 


See Barlow (1993) and the references therein for a discussion of the solution 
procedures that we have outlined above. 


8.5.4 A Divide and Conquer Method 


We now present a divide-and-conquer method for computing the Schur 
decomposition 


QTTQ = A = dig(A,...,4.) 1 3QTQ-I (8.5.5) 


for tridiagonal T that involves (a) “tearing” T in half, (b) computing the the 
Schur decompositions of the two parts, and (c) combining the two half-sized 
Schur decompositions into the required full size Schur decomposition. The 
overall procedure, developed by Dongarra and Sorensen (1987), is suitable 
for parallel computation. 

We first show how T' can be "torn" in half with a rank-one modification. 
For simplicity, assume n = 2m. Define v € IR” as follows 


em (8.5.6) 
v= $ U0. 
e 


Note that for all p € IR the matrix T=T- pvv is identical to T except 
in its “middle four" entries: 


T(m:m +1,m:m+1) = Om — Pp — bm — p 
bm — pF am+1 ~ p? 
If we set p@ = bm then 


0 T 
where 
a, b ma 0 
bi ae 
11 = ; 
bm 1 
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mai Dg = 0 


bm41  Om42 


0 Y bn-1 an 


and Gm = am — p and Gm41 = 6544 — p8?. 
Now suppose that we have m-by-m orthogonal matrices Q and Q» such 
that Q7T,Q = D; and Q1T2Q; = D; are each diagonal. If we set 


Qi 0 | 
U = ; 
0 Q2 
then 
UTTU = v (| p 5 | +r) = D + pzz” 
where s 4s 
Tj 1 
[9 2, | 
is diagonal and z 
= UT — Qj €m 
zZ vU 6QTe, | 


Comparing these equations we see that the effective synthesis of the two 
half-sized Schur decompositions requires the quick and stable computation 
of an orthogonal V such that 


VT(D 4 pzzT)V = A = diag(Ai,..., Àn) 


which we discussed in 88.5.3. 


8.5.5 A Parallel Implementation 


Having stepped through the tearing and synthesis operations, we are ready 
to illustrate the overall process and how it can be implemented on a mul- 
tiprocessor. For clarity, assume that n = 8N for some positive integer N 
and that three levels of tearing are performed. We can depict this with a 
binary tree as shown in Fic. 8.5.1. The indices are specified in binary. 
FIG. 8.5.2 depicts a single node and should be interpreted to mean that 
the eigensystem for the tridiagonal T(b) is obtained from the eigensystems 
of the tridiagonals T(b0) and T'(b1). For example, the eigensystems for the 
N-by-N matrices T(110) and T(111) are combined to produce the eigen- 
system for the 2N-by-2N tridiagonal matrix T(11). 
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T 
T(0) T(1) 


> 
`? 
> 
> 


T(000) T(001) T(010) T(011) T(100 T01) T(10) T(111) 


FIGURE 8.5.1 Computation Tree 


T(b) 


JN 


T(b0)  T(b1) 


FIGURE 8.5.2 Synthesis at a Node 


With tree-structured algorithms there is always the danger that paral- 
lelism is lost as the tree is “climbed” towards the root, but this is not the 
case in our problem. To see this suppose we have 8 processors and that the 
first task of Proc(b) is to compute the Schur decomposition of T(b) where 
b = 000, 001, 010, 011,100, 101,110,111. This portion of the computation is 
perfectly load balanced and does not involve interprocessor communication. 
(We are ignoring the Theorem 8.5.4 deflations, which are unlikely to cause 
significant load imbalance.) 


At the next level there are four gluing operations to perform: T(00), 
T(01), T(10), T(11). However, each of these computations neatly subdi- 
vides and we can assign two processors to each task. For example, once 
the secular equation that underlies the T'(00) synthesis is known to both 
Proc(000) and Proc(001), then they each can go about getting half of the 
eigenvalues and corresponding eigenvectors. Likewise, 4 processors can each 
be assigned to the T'(0) and T(1) problem. All 8 processors can participate 
in computing the eigensystem of T. Thus, at every level full parallelism 
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can be maintained because the eigenvalue/eigenvector computations are 
independent of one another. 


Problems 


P8.5.1 Suppose À is an eigenvalue of a symmetric tridiagonal matrix T. Show that if 
à has algebraic multiplicity k, then at least k — 1 of T’s subdiagonal elements are zero. 
P8.5.2 Give an algorithm for determining o and 6 in (8.5.6) with the property that 
8 € (—1,1) and min( |a; — p|, |a-+1 — P| } is maximized. 

P8.5.3 Let p,(A) = det(T(L:r, 1:r) — AIr) where T is given by (8.5.1). Derive a re- 
cursion for evaluating p}, (à) and use it to develop a Newton iteration that can compute 
eigenvalues of T. 


P8.5.4 What communication is necessary between the processors assigned to a partic- 
ular T4? Is it possible to share the work associated with the processing of repeated dj 
and zero z; ? 


P8.5.5 If T is positive definite, does it follow that the matrices T1 and T? in 88.5.4 are 
positive definite? 


P8.5.6 Suppose that 


D v 
A=| ur i] 


where D = diag(d1,...,d, 1) has distinct diagonal entries and v € R?^! has no zero 
entries. (a) Show that if à € A(A), then D — A14. 1 is nonsingular. (b) Show that if 
A € A(A), then A is a zero of 


n—1 ‘a 
y=) k 
£0) prem 


— da. 


P8.5.7 Suppose A = S+auuT where 5 c FR? X" is skew-symmetric, u € R”, anda € R. 
Show how to compute an orthogonal Q such that QT AQ = T + aere? where T is tridi- 
agonal and skew-symmetric and e; is the first column of In. 


P8.5.8 It is known that A € A(T) where T € R?*" is symmetric and tridiagonal with 
no zero subdiagonal entries. Show how to compute z(1:n — 1) from the equation Tz = Az 
given that Tn = 1. 
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Problem,” SIAM J. Matria Anal. Appl. 14, 598-618. 


Various generalizations to banded symmetric eigenproblems have been explored. 


P. Arbenz, W. Gander, and G.H. Golub (1988). “Restricted Rank Modification of the 
Symmetric Eigenvalue Problem: Theoretical Considerations,” Lin. Alg. and Its 
Applic. 104, 75-95. 

P. Arbenz and G.H. Golub (1988). “On the Spectral Decomposition of Hermitian Ma- 
trices Subject to Indefinite Low Rank Perturbations with Applications,” SIAM J. 
Matriz Anal. Appl. 9, 40-58. 


A related divide and conquer method based on the “arrowhead” matrix (see P8.5.7) is 
given in 


M. Gu and S.C. Eisenstat (1995). “A Divide-and-Conquer Algorithm for the Symmetric 
Tridiagonal Eigenproblem,” SIAM J. Matriz Anal. Appl. 16, 172-191. 


8.6 Computing the SVD 


There are important relationships between the singular value decomposition 
of a matrix A and the Schur decompositions of the symmetric matrices 


T 
AT A, AA’, and H d | Indeed, if 
UT AV = diag(co1,...,04) 


is the SVD of A € R™*" (m > n), then 


VT(AT A)V = diag(o?2,...,02) € IR?*^ (8.6.1) 
and 
UT(AAT)U = diag(o?,...,02,0,...,0) € IR"*"" (8.6.2) 
Áo" 
m-n 


Moreover, if 
U=[(U, U | 


n m-n 


and we define the orthogonal matrix Q € R+) x (m+n) py 


1 V V 0 


Q = — 
v2 U, —U, V2U2 
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then 
i : A IE = diag(o1,...,04 -—01,...,77O0q« 0,...,0 J: (8.6.3) 
A 0 81, 0,50 
m-n 


These connections to the symmetric eigenproblem allow us to adapt the 
mathematical and algorithmic developments of the previous sections to the 
singular value problem. Good references for this section include Lawson 
and Hanson (1974) and Stewart and Sun (1990). 


8.6.1 Perturbation Theory and Properties 


We first establish perturbation results for the SVD based on the theorems 
of §8.1. Recall that o;(A) denotes the ith largest singular value of A. 


Theorem 8.6.1 If A € IR"**, then for k = 1:min(m,n) 
yl Ax | Az |l 


o,(A) = , max min | 2 7" = max min 


de oer l| © ell y lle dim(s)=k zes hale 


Note that in this expression S C R” and T C R” are subspaces. 


Proof. The right-most characterization follows by applying Theorem 8.1.2 
to ATA. The remainder of the proof we leave as an exercise. O 


Corollary 8.6.2 If A and A+E are inIR™”" mithm > n, then fork = 1:n 
lo.(A + E) - ex(A)] < e1(E) = || E |l2- 
Proof. Apply Corollary 8.1.6 to 


E A me p^ e E 


Example 8.6.1 If 


1 4 1 4 
A= 2 5 and A+E = 2 5 
3 6 3 6.01 


then a(A) = (9.5080, .7729) and e(A + E) = {9.5145, .7706}. It is clear that for i = 1:2 
we have |o:(A + E) — e,(A)| < || E |a = -01. 


Corollary 8.6.3 Let A = [a;,...,a5] € IR"*" be a column partitioning 
with m > n. If A. =[a1,...,a,], then forr =1n-1 


0 (Ar+1) > ailAr) 2 02(A;+1) Ze O7(Ar+1) 2 o7(A,) > Ord} (Ap41). 
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Proof. Apply Corollary 8.1.7 to ATA. O 


This last result says that by adding a column to a matrix, the largest 
singular value increases and the smallest. singular value is diminished. 


Example 8.3.2 


1 6 ll 
2 7 12 c(A1) = {7.4162} 

A=|3 8 13 => c(Aa) = {19.5377, 1.8095} 
4 9 14 o(A3) = {35.1272, 2.4654, 0.0000} 
5 10 15 


thereby confirming Corollary 8.6.3. 


The next result is a Wielandt-Hoffman theorem for singular values: 
Theorem 8.6.4 If A and A+ E are in IR™*" with m > n, then 


S (ek(A E) -ox(4) € | Bll. 
k=1 


T T 
Proof. Apply Theorem 8.14 to | 9 A | and | 9 (AR) jo 


ATE 0 


Example 8.8.3 If 


1 4 1 4 
A= 2 5 and A+E = 2 5 
3 6 3 601 


then : 
S| (on(A+ E) -ex(4)* = A72x107* < 107* = | BIR. 
k=1 

See Example 8.6.1. 


For A € R™*”” we say that the k-dimensional subspaces S C IR” and 
T C R” form a singular subspace pair if x € S and y € T imply Az c T 
and ATy € S, The following result is concerned with the perturbation of 
singular subspace pairs. 


Theorem 8.6.5 Let A, E € IR*" with m > n be given and suppose that 
V € IP"*" and U € R™*™ are orthogonal. Assume that 


-[WM Vv] U= [Ui Uz | 
rT n-or r m-r 
and that ran(V,) and ran(Uı) form a singular subspace pair for A. Let 


A 0 T 
H m 11 
U" AV = | 0 22 | m-r 


r n-r 
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UH EV = FF: E T 
m 


E ^ Ej ai 
r n=r 
and assume that 
ó = min lo a= y| > 0. 
c€o(A1) 
yEo(Aa2) 
If 
ó 
E e oes 
| Elle < $, 


then there exist matrices P c R“-")*" and Q e RO" satisfying 


LP] 


such that ran(V, + V2Q) and ran(U; + U3P) is a singular subspace pair for 
A+ E. 


PATI 


F 


Proof. See Stewart (1973), Theorem 6.4. LI 


Roughly speaking, the theorem says that O(c) changes in A can alter a 
singular subspace by an amount e/6, where 6 measures the separation of 
the relevant singular values. 


Example 8.6.4 The matrix A = diag(2.000, 1.001, .999) € IR**? has singular subspace 
pairs (span(vi), span(ui]) for i = 1, 2, 3 where v; = e?) and u, — e! Suppose 
2.000 .010  .010 

.010 1.001  .010 

.010 .010  .999 

.010 .010  .010 


The corresponding columns of the matrices 


.9999 — —.0144 -0007 
p [44 43 tg] .0101 .7415 .6708 
.0101 .6707  —.7616 
.0051 .0138  —.0007 


.9999  —.0143 .0007 

.0101 .7416 .6708 

.0101 6707 —.7416 

define singular subspace pairs for A+. Note that the pair {span{é;}, span{a,}}, is close 
to (span(vi), span(ui)) for i = 1 but not for i = 2 or 3. On the other hand, the singular 
subspace pair (span(2, $3), span{i2, à3)) is close to (span (v2, vs), span(uz, u3}}. 


^ 


Vv = [ in Bq $3] 
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8.60.2 The SVD Algorithm 


We now show how a variant of the QR algorithm can be used to com- 
pute the SVD of an A € R™*" with m > n. At first glance, this appears 
straightforward. Equation (8.6.1) suggests that we 


e form C — AT A, 
e use the symmetric QR algorithm to compute VF CV; = diag(o2), 
e apply QR with column pivoting to AV; obtaining UT(AVj;)II = R. 


Since R has orthogonal columns, it follows that UT A(V,II) is diagonal. 
However, as we saw in Example 5.3.2, the formation of ATA can lead to a 
loss of information. The situation is not quite so bad here, since the original 
A is used to compute U. 

À preferable method for computing the SVD is described in Golub and 
Kahan (1965). Their technique finds U and V simultaneously by implicitly 
applying the symmetric QR algorithm to AT A, The first step is to reduce 
A to upper bidiagonal form using Algorithm 5.4.2: 


di h Tu 0 
0 də : 
Ug AVg = ki B= re e cm, 
a Jn- 
0 0 d. 


The remaining problem is thus to compute the SVD of B. To this end, con- 
sider applying an implicit-shift QR step (Algorithm 8.3.2) to the tridiagonal 
matrix T — BT B: 


œ Compute the eigenvalue A of 


d2 4 f2 -1 dm fm 
dmfmn — df 


T(m:n,m:n) = m-n-l 


that is closer to d2 + f2. 


e Compute c, = cos(0,) and s, = sin(@,) such that 


[2 8] [82] - [8] 
—-51 €i difi ~ 1 0 


and set G = G(1,2,6,). 
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e Compute Givens rotations G5,...,G, , so that if Q = G,---Gy-1 
then QT TQ is tridiagonal and Qe, = Gje1. 


Note that these calculations require the explicit formation of BT B, which, 
as we have seen, is unwise from the numerical standpoint. 

Suppose instead that we apply the Givens rotation G, above to B di- 
rectly. Illustrating with the n = 6 case this gives 


x x 0 0 0 0 
+x x 00 0 
0 0 x x O0 O 
BBLS Jg 9-0: Se. x. 20 
0 0 0 0 x x 
0 0 0 0 0 x 
We then can determine Givens rotations U1, V2, U2,..., Vn-1, and U4.., to 


chase the unwanted nonzero element down the bidiagonal: 


x x + 00 0 

0 x x 0 O 0 

0 0 x x O 0 

B — UjB = 00 0 x x O0 
0 0 0 0 x x 

0 0 0 0 0 x 

x x 0 0 0 0 

0 x x O0 0 O 

= 0+ x x 00 

Boc BaS ge epe Se Se. 70 
0 0 0 0 x x 

0 0 0 0 0 x 

x x 0 0 0 O0 

0 x x + O0 0 

0 0 x x Q Q 

T = 

B-UB=)5 9 0 x x 0 
0 0 0 0 x x 

0 0 0 0 0 x 


and so on. The process terminates with a new bidiagonal B that is related 
to B as follows: 


B = (UI ,-..UD)B(QG1VS--- V4.1) = UT BY. 
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Since each Vj has the form V; = G(i,i+1,6;) where i = 2:n — 1, it follows 
that Ve, = Qe,. By the implicit Q theorem we can assert that V and Q 
are essentially the same. Thus, we can implicitly effect the transition from 
T to T — BT B by working directly on the bidiagonal matrix B. 

Of course, for these claims to hold it is necessary that the underlying 
tridiagonal matrices be unreduced. Since the subdiagonal entries of BT B 
are of the form d;_1, fi, it is clear that we must search the bidiagonal band 
for zeros. If fy = 0 for some k, then 


T Bı 0 k 
x | 0 A n—k 
k n—k 


and the original SVD problem decouples into two smaller problems involv- 
ing the matrices B,and Bo. If di, = 0 for some k < n, then premultiplication 
by a sequence of Givens transformations can zero fy. For example, if n = 
6 and k — 3, then by rotating in row planes (3,4), (3,5), and (3,6) we can 
zero the entire third row: 


So OG x 
coo co oe x x 
eo Ox o 
oo x xX oo 
ox x oot 
xX x ocoocodo 

IS 

e 
eS CCS US X 
Coo em XK Xx 
e Ou x o 
5x 25:5 
ox x--oo 
X x QOO 


=. 
IE 
Cn 
w” 


ooocooox 
ooooxx 
oocooox o 
oo <x oS Oo 
ox x. o-oo 
XxX xo-roo 

Ig 

2 
OoocooOxX 
oo oo x x 
eoeoo x o 
ooxoo 
Ox xoococo 
x x oocoo 


If dn = 0, then the last column can be zeroed with a series of column 
rotations in planes (n — 1, n), (n — 2,n),..., (1, n). Thus, we can decouple 
if fie- fn-1ı = O or d,---d, — O0. 


Algorithm 8.6.1 (Golub-Kahan SVD Step) Given a bidiagonal matrix 
B € IR"*" having no zeros on its diagonal or superdiagonal, the following 
algorithm overwrites B with the bidiagonal matrix B = UT BV where U 
and V are orthogonal and V is essentially the orthogonal matrix that would 
be obtained by applying Algorithm 8.3.2 to T = BT B. 
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Let u be the eigenvalue of the trailing 2-by-2 submatrix of T = BTB 
that is closer to tnn. 


y =t -H 
2-115 
for k= 1:n- 1 


Determine c = cos(#) and s = sin(0) such that 
c 8 
fy lfs i] =t o] 
B = BG(k,k +1,8) 
Y = bkk; 2 = bk+1,k 
Determine c = cos(0) and s = sin(0) such that 


c s} fy EE 
E c J n t 
B — G(k,k + 1,8) B 
if k «n-1 
y = OK et; Z = OK R42 


end 
end 


An efficient implementation of this algorithm would store B’s diagonal and 
superdiagonal in vectors a(1:n) and f(1:n — 1) respectively and would re- 
quire 30n flops and 2n square roots. Accumulating U requires 6mn flops. 
Accumulating V requires 6n? flops. 

Typically, after a few of the above SVD iterations, the superdiagonal 
entry fn—1 becomes negligible. Criteria for smallness within B's band are 
usually of the form 


VA] < edi + ldiail) 
Ij < eB 


where e is a small multiple of the unit roundoff and || - || is some compu- 
tationally convenient norm. 

Combining Algorithm 5.4.2 (bidiagonalization), Algorithm 8.6.1, and 
the decoupling calculations mentioned earlier gives 


Algorithm 8.6.2 (The SVD Algorithm) Given A c R™*" (m > n) and 
€, a small multiple of the unit roundoff, the following algorithm overwrites 
A with UT AV = D+ E, where U c IR"*" is orthogonal, V c IR?*" is 
orthogonal, D € IR" **? is diagonal, and E satisfies || E || = ull A ||2. 


Use Algorithm 5.4.2 to compute the bidiagonalization 


E | e (Ui Un) A(Vi V4.3) 
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until g =n 
Set b; ¿+1 to zero if |b; ii| € €(|bei| + [bi+1 41l) 
for any į = l:in — 1. 
Find the largest q and the smallest p such that if 


Bu 0 Bog 0 n—p-q 
0 0 B33 q 


p n—-p-q q 


then B33 is diagonal and By has nonzero superdiagonal. 
ifg<n 
if any diagonal entry in B»3 is zero, then zero 
the superdiagonal entry in the same row. 
else 
Apply Algorithm 8.6.1 to B22, 
B = diag(I,, U, Ig4m—n)? Bdiag(Ip, V, Iq) 
end 
end 
end 


The amount of work required by this algorithm and its numerical properties 
are discussed in §5.4.5 and §5.5.8. 


Example 8.6.5 If Algorithm 8.6.2 is applied Lo 


1 1 0 0 
| 102 10 
Acc. des di 

000 4 


then the superdiagonal elements converge to zero as follows: 


Iteration  O(laz1]) O(jas2) — O(laaal) 


be 
— 
e 
E— 
e 
m 
eo 


2 109 10° 10° 
3 109 109 10° 
4 109 1071 107? 
5 109 1071 10-8 
6 109 10-71 10-27 
7 10° 107!  converg. 
8 109 107 
9 107! 10714 

10 107! converg. 

11 1074 

12 10712 


13 converg. 


Observe the cubic-like convergence. 
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8.6.3 Jacobi SVD Procedures 


It is straightforward to adapt the Jacobi procedures of §8.4 to the SVD 
problem. Instead of solving a sequence of 2-by-2 symmetric eigenproblems, 
we solve a sequence of 2-by-2 SVD problems. Thus, for a given index pair 
(p,q) we compute a pair of rotations such that 


T 
Cy S] be zil Co 2) =| % 3 
| —81 Cj | Agap  Qqq —89 C? 0 da l 
See P8.6.8. The resulting algorithm is referred to as two-sided because each 
update involves a pre- and post-multiplication. 

A one-sided Jacobi algorithm involves a sequence of pairwise column 
orthogonalizations. For a given index pair (p,q) a Jacobi rotation J(p,q,0) 
is determined so that columns p and q of AJ (p.q, 8) are orthogonal to each 
other. See P8.6.8. Note that this corresponds to zeroing the (p,q) and (q, p) 
entries in ATA. Once AV has sufficiently orthogonal columns, the rest of 
the SVD (U and X) follows from column scaling: AV = UX. 


Problems 


P8.6.1 Show that if B € R”*” is an upper bidiagonal matrix having a repeated singular 
value, then B must have a zero on its diagonal or superdiagonal. 
0 AT 


P8.6.2 Give formulae for the eigenvectors of | A O0 


| in terms of the singular 


vectors of A € R™*" where m > n. 


P8.6.3 Give an algorithm for reducing a complex matrix A to real bidiagonal form 
using complex Householder transformations. 


P8.0.4 Relate the singular values and vectors of A = B + iC (B,C c R™*”) to those 
f B -C 

of | o BÍ 

P8.6.5 Complete the proof of Theorem 8.6.1. 


P8.0.6 Assume that n = 2m and that 5 € R”*” is skew-symmetric and tridiagonal. 
Show that there exists a permutation P c R"*" such that PT SP has the following form: 


0 -BT m 

T eu 

P SP = E ps 
m m 


Describe B. Show how to compute the eigenvalues and eigenvectors of S via the SVD 
of B. Repeat for the case n = 2m + 1. 


P8.6.7 (a) Let 
C | w T | 
y z 


be real. Give a stable algorithm for computing c and s with c? + s? = 1 such that 
B= | oo | C 
-8 € 


is symmetric. (b) Combine (a) with the Jacobi trigonometric calculations in the text 
to obtain a stable algorithm for computing the SVD of C. (c) Part (b) can be used to 
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develop a Jacobi-like algorithm for computing the SVD of A € R"*". For a given (p,q) 
with p < q, Jacobi transformations J(p, q, 01) and J(p, q, 02) are determined such that if 


B = J(p,q, 01)? AJ(p,q, 02), 
then bp = bap = 0. Show 


ofi(B)? = off(A)? ~ b2, - 6e. 


How might p and q be determined? How could the algorithm be adapted to handle the 
case when A € RPX? with rn > n? 


P8.6.8 Let z and y be in R™ and define the orthogonal matrix Q by 


q=| s Ej: 


Give a stable algorithm for computing c and s such that the columns of |z, y] Q are or- 
thogonal to each other. 


P8.6.8 Suppose B € R?** is upper bidiagonal with bnn = 0. Show how to construct 
orthogonal U and V (product of Givens rotations) so that UT BV is upper bidiagonal 
with a zero nth column. 


P8.6.10 Suppose B € R” X^ is upper bidiagonal with diagonal entries d(1:n) and super- 
diagonal entries f(1:n — 1). State and prove a singular value version of Theorem 8.5.1. 
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SVD,” SIAM J. Matriz Anal. Appl. 16, 79-92. 
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J.W. Demmel and W. Kahan (1990). “Accurate Singular Values of Bidiagonal Matrices,” 
SIAM J. Sci. and Stat. Comp. 11, 873-912. 
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lar value computations are discussed in 


J.W. Demmel and K. Veselit (1992). "Jacobi's Method is More Accurate than QR,” 
SIAM J. Matriz Anal. Appl. 15, 1204-1245. 

R. Mathias (1995). "Accurate Eigensystem Computations by Jacobi Methods,” SIAM 
J. Matriz Anal. Appl. 16, 977-1003. 
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8.7 Some Generalized Eigenvalue Problems 


Given a symmetric matrix A € R”*” and a symmetric positive definite 
B € R"", we consider the problem of finding a nonzero vector x and a 
scalar A so Az = ABz. This is the symmetric-definite generalized eigen- 
problem. The scalar à can be thought of as a generalized eigenvalue. As À 
varies, A — AB defines a pencil and our job is to determine 


\(A, B) = { à | det(A — AB) = 0}. 


A symmcetric-definite generalized eigenproblem can be transformed to an 
equivalent problem with a congruence transformation: 


A-—ABissingdar «€  (XTAX) — A(X7 BX) is singular 


Thus, if X is nonsingular, then A(4, B) = A(XT AX, XT BX). 

In this section we present various structure-preserving procedures that 
solve such eigenproblems through the careful selection of X. The related 
generalized singular value decomposition problem is also discussed. 


8.7.1 Mathematical Background 


We seek is a stable, efficient algorithm that computes X such that X7 AX 
and X7 BX are both in "canonical form." The obvious form to aim for is 
diagonal form. 


Theorem 8.7.1 Suppose A and B are n-by-n symmetric matrices, and 


define C(u) by 
C(p) = pA+(1-p)B PER. (8.7.1) 
If there exists a p € [0,1] such that C(u) is non-negative definite and 
null(C(u)) = null(A) N null(B) 


then there exists a nonsingular X such that both XT AX and XT BX are 
diagonal. 


Proof. Let p € [0, 1] be chosen so that C() is non-negative definite with 
the property that null(C(,z)) = null(A) N null(B). Let 


QrCc(u)Qi = | D 


0 0 | D = diag(di,..., dx), di > 0 


be the Schur decomposition of C(u) and define X= Q,diag(D~1/2, I,_,). 
If Aj = XlAXi|, Bp X} BX, and Ci = XilC(uXi, then 


I, 0 
Ci = | x 3 = pA, t (1 — p)Bı. 
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Since span{eéz41,---,€n} = null((Ci) = null(Aj)'inull( Bi) it follows that 
A, and B, have the following block structure: 


_ | An 0 k Bà 0 k 
A= |^ X mu B= |^ dme 
k n-k k n-k 


Moreover Ik = pAi; + (1 — p) Bi. 
Suppose u Æ 0. It then follows that if ZTB, Z = diag(b;,..., 54) is 
the Schur decomposition of B4; and we set X = X,diag(Z,JI,_~) then 


XTBX = diag(b,,...,b,,0,...,0) = Dp 


and 


XTAX = ZXT (Cu) - (1 = B) X 


- [5 2]-ooon) en 


On the other hand, if u = 0, then let ZT AZ = diag(a),...,a,) be the 
Schur decomposition of Aj; and set X = Xi;diag(Z,I4, ,). It is easy to 
verify that in this case as well, both XT AX and XT BX are diagonal. O 


Frequently, the conditions in Theorem 8.7.1 are satisfied because either A 
or B is positive definite. 


Corollary 8.7.2 If A — AB € R°*” is symmetric-definite, then there ez- 
ists a nonsingular X = [2z1,...,z4] such that 


XTAX = diag(ai,...,4n) and XTBX = diag(b,...,5,). 
Moreover, At; = A,Bzj for i = iin where à; = a, /);. 
Proof. By setting u = 0 in Theorem 8.7.1 we see that symmetric-definite 


pencils can be simultaneously diagonalized. The rest of the corollary is 
easily verified. D 


Example 8.7.1 If 


_ [229 163 _ [ 81 59 
a = | ies N ae B= | 5o "tl 


then A — AB is symmetric-definite and A(A, B) = (5, —1/2}. If 


x-[4 1] 
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then XT AX = diag(5,—1) and XT BX = diag(1, 2): 


Stewart (1979) has worked out a perturbation theory for symmetric 
pencils A — AB that satisfy 


c(A,B) = mn (zT Ax)? + (zT Br)? > 0 (8.7.2) 


z 2-1 


The scalar c(A, B) is called the Crawford number of the pencil A — AB. 


Theorem 8.7.3 Suppose A — AB is an n-by-n symmetric-definite pencil 
with eigenvalues 
Ay 2 A2 227 An. 


Suppose E4 and Eg are symmetric n-by-n matrices that satisfy 
e = || Za ll +| Esl} < c(A,B). 
Then (A + E4) — A(B + Eg) is symmetric-definite with eigenvalues 
Big 


that satisfy 


larctan(A;) — arctan(g;)| € arctan(e/c( A, B)) 
for 1 — l:n. 


Proof. See Stewart (1979). O 


8.7.2 Methods for the Symmetric-Definite Problem 


Turning to algorithmic matters, we first present a method for solving the 
symmetric-definite problem that utilizes both the Cholesky factorization 
and the symmetric QR algorithm. 


Algorithm 8.7.1 Given A= A? € IRP"*" and B = BT e IR*** with B 
positive definite, the following algorithm computes a nonsingular X such 
that X7 BX= I, and XT AX = diag(a;,...,a4). 


Compute the Cholesky factorization B — GGT 
using Algorithm 4.2.2. 

Compute C = G-1AG-T. 

Use the symmetric QR algorithm to compute the Schur 
decomposition QT CQ = diag(ai, ... , a4). 

Set X = G- TQ. 
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This algorithm requires about 14n* flops. In a practical implementation, 
A can be overwritten by the matrix C. See Martin and Wilkinson (1968c) 
for details. Note that 


X(A, B) = (A, GGT) = X(G71 AG-T, I) = A(C) = (a1,...,a4])- 


If â; is a computed eigenvalue obtained by Algorithm 8.7.1, then it can 
be shown that à; € A(G^1 AG^7 + Ej), where || E; || = ull A [[2]] B^! lle. 
Thus, if B is ill-conditioned, then à; may be severely contaminated with 
roundoff error even if aj is a well-conditioned generalized eigenvalue. The 
problem, of course, is that in this case, the matrix C = G^! AG? can have 
some very large entries if B, and hence G, is ill-conditioned. This difficulty 
can sometimes be overcome by replacing the matrix G in Algorithm 8.7.1 
with VD-1/2 where VT BV = D is the Schur decomposition of B. If the 
diagonal entries of D are ordered from smallest to largest, then the large 
entries in C are concentrated in the upper left-hand corner. The small 
eigenvalues of C can then be computed without excessive roundoff error 
contamination (or so the heuristic goes). For further discussion, consult 
Wilkinson (1965, pp.337—38). 


Example 8.7.2 If 
1 2 3 001 0 0 
A= [2 4 5 and G = 1 001 0 
3 56 2 1 .001 


and B = GGT, then the two smallest eigenvalues of A — AB are 
aj = —0.619402940600584 a2 = 1.627440079051887. 


If 17-digit floating point arithmetic is used, then these eigenvalues are computed to full 
machine precision when the symmetric QR algorithm is applied to fI(D-1/2VT AV D-V/2), 
where B = VDVT is the Schur decomposition of B. On the other hand, if Algorithm 
8.7.1 is applied, then 


à] = —0.619373517376444 à2 = 1.627516601905228. 


The reason for obtaining only four correct significant digits is that &2(B) = 1015. 


The condition of the matrix X in Algorithm 8.7.1 can sometimes be 
improved by replacing B with a suitable convex combination of A and B. 
The connection between the eigenvalues of the modified pencil and those 
of the original are detailed in the proof of Theorem 8.7.1. 

Other difficulties concerning Algorithm 8.7.1 revolve around the fact 
that G~!AG~? is generally full even when A and B are sparse. This is a 
serious problem, since many of the symmetric-definite problems arising in 
practice are large and sparse. 

Crawford (1973) has shown how to implement Algorithm 8.7.1 effec- 
tively when A and B are banded. Aside from this case, however, the si- 
multaneous diagonalization approach is impractical for the large, sparse 
symmetric-definite problem. 
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An alternative idea is to extend the Rayleigh quotient iteration (8.4.4) 
as follows: 


zo given with || zo ||; = 1 

for k = 0,1,... 
pk = tf Ax, /x} Brp (8.7.3) 
Solve (A — uk B)zyg41 = Bzy for Zk41. 
Tk+1 = Zke1/l Zk+ ll2 

end 


The mathematical basis for this iteration is that 


zl Ax 
mA utu i 7.4 
À TBs (8.7.4) 


minimizes 


f(A) = || Az - ABz |l (8.7.5) 


where ||- ||B is defined by ||z||2, = 27 B-1z. The mathematical properties of 


(8.7.3) are similar to those of (8.4.4). Its applicability depends on whether 
or not systems of the form (A — 7 B)z = x can be readily solved. A similar 
comment pertains to the following generalized orthogonal iteration: 


Qo € R"*? given with QF Qo = I, 

for k = 1,2,... 
Solve BZ; = AQyx. 1 for Zk. (8.7.6) 
Zk = QR, (QR factorization) 

end 


This is mathematically equivalent to (7.3.4) with A replaced by B^!A. Its 
practicality depends on how easy it is to solve linear systems of the form 
Bz = y. 

Sometimes A and B are so large that neither (8.7.3) nor (8.7.6) can be 
invoked. In this situation, one can resort to any of a number of gradient 
and coordinate relaxation algorithms. See Stewart (1976) for an extensive 
guide to the literature. 


8.7.3 The Generalized Singular Value Problem 


We conclude with some remarks about symmetric pencils that have the 
form AT A — ABT B where A € IR™*" and B € IR?*". This pencil under- 
lies the generalized singular value decomposition (GSVD), a decomposition 
that is useful in several constrained least squares problems. (Cf. 812.1.) 
Note that by Theorem 8.7.1 there exists a nonsingular X € IR?*" such that 
XT(AT A)X and X7 (B? B)X are both diagonal. The value of the GSVD 
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is that these diagonalizations can be achieved without forming A? A and 
BT B. 


Theorem 8.7.4 (Generalized Singular Value Decomposition) Jf we 
have A € R™*" with m > n and Be IRP*", then there exist orthogonal 
U € R"*" and V c RP”? and an invertible X € IR^*" such that 


UTAX = C = diag(e,...,c) | c; 20 


and 
VTBX = 


| 
tn 
| 


= diag(s),..., 54) 8,20 


where q = min(p, n). 


Proof. The proof of this decomposition appears in Van Loan (1976). We 
present & more constructive proof along the lines of Paige and Saunders 
(1981). For clarity we assume that null(A) A null(B) = {0} and p> n. We 
leave it to the reader to extend the proof so that it covers theses cases. 


Let 
H = E |z (8.7.6) 


be a QR factorization with Q, € IR?*", Qz c IRP*", and Re R”*”. Paige 
and Saunders show that the SVD’s of Q, and Q» are related in the sense 
that 


Qı = UCWT QQ = VSWT (8.7.7) 


Here, U,V, and W are orthogonal, C = diag(c;) with 0 < c4 <---<e,, S 
= diag(s;) with $; > +. > sn, and CTC + STS = I,. The decomposition 
(8.7.7) is a variant of the CS decomposition in §2.6 and from it we conclude 
that A= Q,R = UC(WT R) and B=Q2R = VS(WT R). The theorem 
follows by setting X = (W7R)-', D4 = C, and Dg = S . The invertibility 
of R follows from our assumption that null(A) N null(B) = {0}. O 


The elements of the set o(A,B) = {1/31,...,¢n/8q ) are referred 
to as the generalized singular values of A and B. Note that c € o(A, B) 
implies that o? € A(AT A, BT B). The theorem is a generalization of the 
SVD in that if B = In, then o(A, B) = o(A). 

Our proof of the GSVD is of practical importance since Stewart (1983) 
and Van Loan (1985) have shown how to stably compute the CS decompo- 
sition. The only tricky part is the inversion of WTR to get X. Note that 
the columns of X = [21,...,2, | satisfy 


s?AT Ax, = cBTBz; i=1m 


and so if s; Æ 0 then AT Az; = o? BT Br; where g; = cj/s;. Thus, the z; 
are aptly termed the generalized singular vectors of the pair (A, B). 


8.7. SOME GENERALIZED EIGENVALUE PROBLEMS 467 


In several applications an orthonormal basis for some designated gen- 
eralized singular vector subspace space span(zi,,...,r;,] is required. We 
show how this can be accomplished without any matrix inversions or cross 
products: 


e Compute the QR factorization 
A Qi 
B Q» 


e Compute the CS decomposition 
Qı = UCW* Q = VSWT 
and order the diagonals of C and S so that 
[ef 21, ..-,ck/8& ) = {Ci /Si,,---, Ci /[ 84,]- 


e Compute orthogonal Z and upper triangular T' so TZ = WTR. (See 
P8.7.5.) Note that if X^! = WTR = TZ, then X = ZTT! and so 
the first k rows of Z are an orthonormal basis for span{z,,..., x). 


Problems 


P8.7.1 Suppose A € R?** is symmetric and G € R”*" is lower triangular and nonsin- 
gular. Give an efficient algorithm for computing C = G~!AG-T 

P8.7.2 Suppose A € R^** is symmetric and B € R?** is symmetric positive definite. 
Give an algorithm for computing the eigenvalues of AB that uses the Cholesky factor- 
ization and the symmetric QR algorithm. 

P8.7.3 Show that if C is real and diagonalizable, then there exist symmetric matrices A 
and B, B nonsingular, such that C = AB". This shows that symmetric pencils A— AB 
are essentially general. 

P8.7.4 Show how to convert an Ar = ABz problem into a generalized singular value 
problem if A and B are both symmetric and non-negative definite. 

P8.7.5 Given Y c R°®** show how to compute Householder matrices H5,..., Hn so 
that Y H4,-.. H2 = T is upper triangular. Hint: Hy zeros out the kth row. 


P8.7.8 Suppose 
0 A eae B, 0 y 
AT 0 z RS 0 Be z 


where A c R7?*", B, € R™*™ and B; € R^**, Assume that Bı and B; are positive 
definite with Cholesky triangles G1 and G2 respectively. prelate the generalized eigen- 
values of this problem to the singular values of G] !AG4T 


P8.7.7 Suppose A and B are both symmetric positive definite. Show how to compute 
(A, B) and the corresponding eigenvectors using the Cholesky factorization and CS 
decomposition. 


Notes and References for Sec. 8.7 


An excellent survey of computational methods for symmetric-definite pencils is given in 
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Chapter 9 


Lanczos Methods 


89.1 Derivation and Convergence Properties 
89.2 Practical Lanczos Procedures 

$9.3 Applications to Az = b and Least Squares 
§9.4 Arnoldi and Unsymmetric Lanczos 


In this chapter we develop the Lanczos method, a technique that can be 
used to solve certain large, sparse, symmetric eigenproblems Ax = Az. The 
method involves partial tridiagonalizations of the given matrix A. How- 
ever, unlike the Householder approach, no intermediate, full submatrices 
are generated. Equally important, information about A’s extremal eigen- 
values tends to emerge long before the tridiagonalization is complete. This 
makes the Lanczos algorithm particularly useful in situations where a few 
of A’s largest or smallest eigenvalues are desired. 

The derivation and exact arithmetic attributes of the method are pre- 
sented in 89.1. The key aspects of the Kaniel-Paige theory are detailed. 
This theory explains the extraordinary convergence properties of the Lanc- 
zos process. Unfortunately, roundoff errors make the Lanczos method some- 
what difficult to use in practice. The central problem is a loss of orthog- 
onality among the Lanczos vectors that the iteration produces. There are 
several ways to cope with this as we discuss §9.2. 

In §9.3 we show how the “Lanczos idea” can be applied to solve an as- 
sortment of singular value, least squares, and linear equations problems. Of 
particular interest is the development of the conjugate gradient method for 
symmetric positive definite linear systems. The Lanczos-conjugate gradient 
connection is explored further in the next chapter. In §9.4 we discuss the 
Arnoldi iteration which is hased on the Hessenberg decomposition and a 
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version of the Lanczos process that can (sometimes) be used to tridiago- 
nalize unsymmetric matrices. 


Before You Begin 


Chapters 5 and 8 are required for §9.1-9.3 and Chapter 7 is needed for 
§9.4. Within this chapter there are the following dependencies: 


$9.1 — $92 — §9.3 


L 
$0.4 


A wide range of Lanczos papers are collected in Brown, Chu, Ellison, and 
Plemmons (1994). Other complementary references include Parlett (1980), 
Saad (1992), and Chatelin (1993). The two volume work by Cullum and 
Willoughby (1985a,1985b) includes both analysis and software. 


9.1 Derivation and Convergence Properties 


Suppose A c IR"*" is large, sparse, and symmetric and assume that a few 
of its largest and/or smallest eigenvalues are desired. This problem can be 
solved by a method attributed to Lanczos (1950). The method generates 
a sequence of tridiagonal matrices T, with the property that the extremal 
eigenvalues of Tj € IR^** are progressively better estimates of A's extremal 
eigenvalues. In this section, we derive the technique and investigate its 
exact arithmetic properties. Throughout the section A,(-) designates the 
ith largest eigenvalue. 


9.1.1 Krylov Subspaces 


The derivation of the Lanczos algorithm can proceed in several ways. So 
that its remarkable convergence properties do not come as a complete sur- 
prise, we prefer to lead into the technique by considering the optimization 
of the Rayleigh quotient 


rT Ar 
gly 


he) = a0. 
Recall from Theorem 8.1.2 that the maximum and minimum values of r(z) 
are À (A) and A4(A), respectively. Suppose {g;} € IR” is a sequence of 


orthonormal vectors and define the scalars Mj and m, by 


T(QT AQ,)y 
My = X(QIAQ) = max ree = max r(Qay) < (A) 
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T TA 
Mk = Ak(QT AQ) — min y (Qi AQQy = min r(Qky) > An (A) 
yz yy lulla-1 
where Qk = [qi...,qx]. The Lanczos algorithm can be derived by con- 


sidering how to generate the qx so that Mj and m, are increasingly better 
estimates of à, (A) and A4(A). 

Suppose uz € span(qi,...,qx] is such that Mg = r(uj). Since r(z) 
increases most rapidly in the direction of the gradient 


2 
V — ane Å mE ; 
rz) = cp (As -r(a)z) 
we can ensure that Mz4) > Mx if qg41 is determined so 


Vr(ux) € span(qi,..., qkai]- (9.1.1) 


(This assumes Vr(uy) Æ 0.) Likewise, if v € span(gi,...,gx) satisfies 
r(vk) = Mmk, then it makes sense to require 


Vr(vx) € span{qi,.--, 9x41} (9.1.2) 


since r(x) decreases most rapidly in the direction of —Vr(x). 

At first glance, the task of finding a single 9,41 that satisfies these two 
requirements appears impossible. However, since Vr(x) € span(z, Az}, it 
is clear that (9.1.1) and (9.1.2) can be simultaneously satisfied if 


span(gi,...,Qk) = span{qi, Aqi,..., A*^1gi) 
and we choose 9x41 SO 
spaníqi,...,Qk41) = span{qy, Agro av A*qy}. 


Thus, we are led to the problem of computing orthonormal bases for the 
Krylov subspaces 


K(A,q,k) = span(qi, Aqi,..., AF 1gi). 
These are just the range spaces of the Krylov matrices 
K(A,q1,n) = [a Aq, Aq... A771 ]. 
presented in §8.3.2. 


9.1.2  Tridiagonalization 


In order to find this basis efficiently we exploit the connection between the 
tridiagonalization of A and the QR factorization of K (A, q;,n). Recall that 
if QT AQ =T is tridiagonal with Qe, = q1, then 


K(A,qi,n) = Q [ e1, Tei, T* 6j iiis The 
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is the QR factorization of K(A,q1, n) where e; = In(:, 1). Thus the qk can 
effectively be generated by tridiagonalizing A with an orthogonal matrix 
whose first column is q1- 

Householder tridiagonalization, discussed in $8.3.1, can be adapted for 
this purpose. However, this approach is impractical if A is large and sparse 
because Householder similarity transformations tend to destroy sparsity. 
As a result, unacceptably large, dense matrices arise during the reduction. 

Loss of sparsity can sometimes be controlled by using Givens rather 
than Householder transformations. See Duff and Reid (1976). However, 
any method that computes T by successively updating A is not useful in 
the majority of cases when A is sparse. 

This suggests that we try to compute the elements of the tridiagonal 
matrix T = QT AQ directly. Setting Q = [q1,...,q4 ] and 


Q1 £i ane 0 
By a l 
T = 
h . e Bn-1 
O + By-1 Qn 


and equating columns in AQ = QT, we find 


Age = PBk-1dk—1 + akak + Pages Gogo = 0 


for k = 1:n — 1. The orthonormality of the q; implies a, = gf Agr. 
Moreover, if ry = (A — agl)ay — Dk-1qk—1 is nonzero, then qg41 = Tk/ Bk 
where Pk = || rx ||2. If rk = 0, then the iteration breaks down but (as 
we shall see) not without the acquisition of valuable invariant subspace 
information. So by properly sequencing the above formulae we obtain the 
Lanczos iteration: 


To =H; Mo = 1; og = 0; K=O 

while (3, 4 0) 
Qe41 = Th/ Pe; k =k +1; oy = q Age (9.1.3) 
rk = (A — axI)qk — Be—-19k-13 Be = || rk llo 

end 


There is no loss of generality in choosing the £, to be positive. The gq, are 
called Lanczos vectors. 


9.1.3 Termination and Error Bounds 


The iteration halts before complete tridiagonalization if qı is contained in 
8 proper invariant subspace. 'T'his is one of several mathematical properties 
of the method that we summarize in the following theorem. 
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Theorem 9.1.1 Let A € IR"*” be symmetric and assume qı € IR” has unit 
2-norm. Then the Lanczos iteration (9.1.3) runs until k = m, where m = 
rank( (A, q1, n)) Moreover, for k = 1:m we have 


AQk = QXTy + rke (9.1.4) 
where 
aj fh TE 0 
Pi o 
IL = 
Bk-1 
OP xs Bk-1 Qk 


and Qk = [q,..., qk] has orthonormal columns that span K(A, qi, k). 


Proof. The proof is by induction on k. Suppose the iteration has produced 
Qr = [ay..., gx] such that ran(Q,) = K(A,qi,k) and QEQy = I. Tt is 
easy to see from (9.1.3) that (9.1.4) holds. Thus, QT AQ, = Tk - Qfrie,. 
Since a; = qf Aq; for i = 1:k and 


dLiAq = gL,(Aqi ~ aiqi — Biidi-i) = qLa(Biii) = Bi 


for i = 1:k ~ 1, we have QT AQ, = Tp. Consequently, QTr, = 0. 
If rk #0, then qk41 = rx/|| rx lla is orthogonal to q1...., qx and 


Qk41 € span( Aqk, gk, qk-1] C K(A, qi, k 1). 


Thus, Qf Qk = k41 and ran(Qk41) = K(A,qi, k + 1). On the other 
hand, if r, = 0, then AQ, = QkTk. This says that ran(Q;) = K(A, q1, K) 
is invariant. From this we conclude that k = m = rank(K(A,q1,n)). O 


Encountering a zero k in the Lanczos iteration is a welcome event in that it 
signals the computation of an exact invariant subspace. However, an exact 
zero or even a small fj, is a rarity in practice. Nevertheless, the extremal 
eigenvalues of Tk turn out to be surprisingly good approximations to A's 
extremal eigenvalues. Consequently, other explanations for the convergence 
of Ty's eigenvalues must be sought. The following result is a step in this 
direction. 


Theorem 9.1.2 Suppose that k steps of the Lanczos algorithm have been 
performed and that STT,S, = diag(61,...,04) is the Schur decomposition 
of the tridiagonal matriz Ty. If Ye — [yi,.... yk] = QeSe € IR"™*, then 
for i — lik we have | Ay; — 9:4: lla = {Gel |sxi| where Sk = (sp). 


Proof. Post-multiplying (9.1.4) by 5; gives 


AY, = Y,diag(@;, Sawy Gy) + TRET Sk, 
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and so Ay; = biyi + rx (e} Se;). The proof is complete by taking norms 
and recalling that || ry ||o = |x|. (1I 


The theorem provides computable error bounds for 74's eigenvalues: 


min 10; —p| < |Get [ski] ilk 
p€A(A) 


Note that in the terminology of Theorem 8.1.15, the (6;, y;) are Ritz pairs 
for the subspace ran(Q). 

Another way that 7; can be used to provide estimates of A's eigenvalues 
is described in Golub (1974) and involves the judicious construction of a 
rank-one matrix E such that ran(Qy) is invariant for A+ E. In particular, 
if we use the Lanczos method to compute AQ, = QkTk + TEL and set E 
= rww? , where r = +1 and w = aq, + bry, then it can be shown that 


(A+ E)Qk = QX(Ty + ra? exe) + (1 + rab)rse; . 


If 0 = 1 + rab, then the eigenvalues of T, = Tk + Ta ekek, a tridiagonal 
matrix, are also eigenvalues of A + E. Using Theorem 8.1.8 it can be shown 
that the interval [A; (Ť k), Aii (T) contains an eigenvalue of A for i = 2:k. 

These bracketing intervals depend on the choice of ra^. Suppose we 
have an approximate eigenvalue of À of A. One possibility is to choose 
ra? so that det(T, — Ak) = (o3 + ra? — A) 1(3) — 82 Pk-2(À) = 0 
where the polynomials p;(x) = det(T; — xJ;) can be evaluated at A using the 
three-term recurrence (8.5.2). (This assumes that py..1(À) Æ 0.) Eigenvalue 
estimation in this spirit is discussed in Lehmann (1963) and Householder 
(1968). 


9.1.4 The Kaniel-Paige Convergence Theory 


The preceding discussion indicates how eigenvalue estimates can be ob- 
tained via the Lanczos algorithm, but it reveals nothing about rate of con- 
vergence. Results of this variety constitute what is known as the Kanzel- 
Paige theory, a sample of which follows. 


Theorem 9.1.3 Let A be an n-by-n symmetric matrir with eigenvalues 
Ay 2 +++ > AQ and corresponding orthonormal eigenvectors z,...,2%n. If 
0| >--- > 0, are the eigenvalues of the matriz Ty obtained after k steps of 
the Lanczos iteration, then 


(A, — An) tan($)? 
(ck-1(1 + 2p1))? 


where cos(ó1) = lai zi pr = (Ar — A2)/(A2 — An), and cy_i(x) is the 
Chebyshev polynomial of degree k — 1. 


Ai 24, >X1 - 
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Proof. From Theorem 8.1.2, we have 
T T T 

0; = max 2a I — max (Gey) A(Qxy) AQ) = max = is 

yx0 — yiU y#o0 — (Quy)? (Quy) OXweK(A.,k) WW 


Since A, is the maximum of w? Aw/w! w over all nonzero w, it follows that 
A, > 8). To obtain the lower bound for 4), note that 


T 
qi P(A)Ap(A)ar 
i peP,_, 94 p(A)?q 


n 
where P&..; is the set of k — 1 degree polynomials. If q) = D d;z; then 
¿=l 


n 
S GP 
qi p(A)Ap(A)q _ iz i 
T5( AY? m n 
qi P(A)?q1 S^ dip? 
4z1 
n 
S dip 
> M — A- An) maass. 
dip? + 5 dp) 
$2 


We can make the lower bound tight by selecting a polynomial p(x} that is 
large at z = A, in comparison to its value at the remaining eigenvalues. 
One way of doing this is to set 


ES 
ple) = a (rea) 


where cy 1(z) is the (k — 1)-st Chebyshev polynomial generated via the 
recursion 


ck(z) =. 2zey-i(z)— ek-2(2) &œ=1,c¢ =z. 


These polynomials are bounded by unity on [—1,1], but grow very rapidly 
outside this interval. By defining p(x) this way it follows that |p(A,)] is 
bounded by unity for i = 2:n, while p(A1) = cy 1(1 + 291). Thus, 


1-d? 1 
00 > A (Àr - A.) —— ——Á———s. 
et a eo, 


The desired lower bound is obtained by noting that tan(#,)? = (1—d?)/d?. Hl 


An analogous result pertaining to 0, follows immediately from this theorem: 
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Corollary 9.1.4 Using the same notation as the theorem, 


< 0 € Àn Qu — An) tan(Pn)” 
ee P Ck-1(1- 209)? 
where pn = (An-1 — An)/(A1 — An-1) and cos(¢n) = Q4 2s. 


Proof. Apply Theorem 9.1.3 with A replaced by —A. LI 


9.1.5 The Power Method Versus the Lanczos Method 


It is worthwhile to compare 01 with the corresponding power method esti- 
mate of A1. (See 88.2.1.) For clarity, assume Àj > --- > An > 0. After k—1 
power method steps applied to qi, a vector is obtained in the direction of 


"n 
v = AF lg = $3 su 
i=1 


along with an eigenvalue estimate 


v? Av 


viy ` 


qi 


Using the proof and notation of Theorem 9.1.3, it is easy to show that 


2k-1 
Ay > qi 2 Ar — (1 — An) tan(¢1)? (3) (9.1.5) 


(Hint: Set p(z} = z*-! in the proof.) Thus, we can compare the quality of 
the lower bounds for 0; and +, by comparing 


Lk-1 = 1/ le-a (2% -1)] zZ 1/ [er-1(1 + 2p1))? 


Jo \ 2-1) 
Ry. = (32) 


This is done in following table for representative values of k and A2/A:. 

The superiority of the Lanczos estimate is self-evident. This should 
be no surprise, since 01 is the maximum of r(x) = z7 Az/zT x over all of 
US qı; k), while yı = r(v) for a particular v in K(A, q1, k}, namely v = 
A*- qı- 


and 
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Ay /A2 k=65 k=10 k=15 k = 20 k = 25 
1.50 1.11074 2.0x10—1° 3.9x 10718 7.4x 1077? 1.4x 10727 

° 3.9x 10-2 6.8x 10-4 1.2x 10-5 2.0x 10-7 3.5x 10-9 
1.10 2.7x107? 5.5x 107? 1.1x1077 2.1x10- 19 4.2x 10713 
` 4.7x10-! 1.8x10-! 6.9x10-72 2.7x1072 1.0x 1072 
1.01 5.6x107! 1.0x 107! 1.5x 1077 2.0x1073 2.8x 1074 
; 9.2x10-! 8.4x 10-1 7.6x10-1 6.9x10-! 6.2x10-! 
TABLE 9.1.1 Lk-1/Rk-1ı 
9.1.6 Convergence of Interior Eigenvalues 


We conclude with some remarks about error bounds for Tx’s interior eigen- 
values. The key idea in the proof of Theorem 9.1.3 is the use of the trans- 
lated Chebyshev polynomial. With this polynomial we amplified the com- 
ponent of q; in the direction z1. A similar idea can be used to obtain bounds 
for an interior Ritz value @;. However, the bounds are not as satisfactory be- 
cause the “amplifying polynomial" has the form q(x)II7] (x — A;) , where 
q(r) is the (k — 1) degree of the Chebyshev polynomial on the interval 
[Aiz1;An|- For details, see Kaniel (1966), Paige (1971), or Saad (1980). 


Problems 


P9.1.1 Suppose A € R'*" is skew-symmetric. Derive a Lanczos-like algorithm for 
computing a skew-symmetric tridiagonal matrix Tm such that AQm = Qm7m, where 
QT m = Im. 

P9.1.2 Let A € R?*" be symmetric and define r(z) = zT Az/z7 z. Suppose S C R” 
is a subspace with the property that z € S implies Vr(z) € S. Show that S is invariant 
for A. 

P9.1.3 Show that if a symmetric matrix A € R**" has a multiple eigenvalue, then the 
Lanczos iteration terminates prematurely. 

P9.1.4 Show that the index 7n in Theorem 9.1.1 is the dimension of the smallest in- 
variant subspace for A that contains q1. 

P9.1.5 Let A € R?*" be symmetric and consider the problem of determining an or- 
thonormal sequence 41, 42,... with the property that once Qy =[q1,..-, qx ] is known, 
Qk41 is chosen so as to minimize gy, = || (Z — Qe41O7,,)AQe |. Show that if 
span(gi,...,qk) = K(A,qi, k), then it is possible to choose qx4.1 so jy = 0. Explain 
how this optimization problem leads to the Lanczos iteration. 

P9.1.6 Suppose A c R"*" is symmetric and that we wish to compute ita largest eigen- 
value. Let 7 be an approximate eigenvector and set 


nT An 
nTn 
Ar — am. 


a = 


z = 
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(a) Show that the interval [a — 6,a + 5] must contain an eigenvalue of A where 6 = 
|| z !2/]] lla. (b) Consider the new approximation #7 = an + bz and show how to deter- 
mine the scalars a and b so that 

J An 

WA 

is maximized. (c) Relate the above computations to the first two steps of the Lanczos 
process. 


a= 


Notes and References for Sec. 9.1 


The classic reference for the Lanczos method is 


C. Lanczos (1950). “An Iteration Method for the Solution of the Eigenvalue Problem of 
Linear Differential and Integral Operators,” J. Res. Nat. Bur. Stand. 45, 255-82. 


Although the convergence of the Ritz values is alluded to this paper, for more details we 
refer the reader to 


S. Kaniel (1966). *Estimates for Some Computational Techniques in Linear Algebra," 
Math. Comp. 20, 369-78. 

C.C. Paige (1971). "The Computation of Eigenvalues and Eigenvectors of Very Large 
Sparse Matrices," Ph.D. thesis, London University. 

Y. Saad (1980). “On the Rates of Convergence of the Lanczos and the Block Lanczos 
Methods,” SIAM J. Num. Anal.17, 687-706. 


The connections between the Lanczos algorithm, orthogonal polynomials, and the theory 
of moments are discussed in 


N.J. Lehmann (1963). “Optimale Eigenwerteinschliessungen," Numer. Math. 5, 246-72. 

A.S. Householder (1968). “Moments and characteristic Roots II,” Numer. Math. 11, 
126-28. 

G.H. Golub (1974). “Some Uses of the Lanczos Algorithm in Numerical Linear Algebra,” 
in Topics in Numerical Analysis, ed., J.J.H. Miller, Academic Press, New York. 


We motivated our discussion of the Lanczos algorithm by discussing the inevitability of 
fill-in when Householder or Givens transformations are used to tridiagonalize. Actually, 
fill-in can sometimes be kept to an acceptable level if care is exercised. See 


LS. Duff (1974). "Pivot Selection and Row Ordering in Givens Reduction on Sparse 
Matrices," Computing 13, 239-48. 

LS. Duff and J.K. Reid (1976). “A Comparison of Some Methods for the Solution of 
Sparse Over-Determined Systems of Linear Equations," J. Inst. Maths. Applic. 17, 
267-80. 

L. Kaufman (1979). “Application of Dense Householder Transformations to a Sparse 
Matrix,” ACM Trans. Math. Soft. 5, 442-50. 


9.2 Practical Lanczos Procedures 


Rounding errors greatly affect the behavior of the Lanczos iteration. The 
basic difficulty is caused by loss of orthogonality among the Lanczos vectors, 
a phenomenon that muddies the issue of termination and complicates the 
relationship between A’s eigenvalues and those of the tridiagonal matrices 
Ty. This troublesome feature, coupled with the advent of Householder’s 
perfectly stable method of tridiagonalization, explains why the Lanczos 
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algorithm was disregarded by numerical analysts during the 1950’s and 
1960’s. However, interest in the method was rejuvenated with the devel- 
opment of the Kaniel-Paige theory and because the pressure to solve large, 
sparse eigenproblems increased with increased computer power. With many 
fewer than n iterations typically required to get good approximate extremal 
eigenvalues, the Lanczos method became attractive as a sparse matrix tech- 
nique rather than as a competitor of the Householder approach. 

Successful implementations of the Lanczos iteration involve much more 
than a simple encoding of (9.1.3). In this section we outline some of the 
practical ideas that have been proposed to make Lanczos procedure viable 
in practice. 


9.2.1 Exact Arithmetic Implementation 


With careful overwriting in (9.1.3) and exploitation of the formula 


ak = gi (Agx — Bxe-19k-1); 


the whole Lanczos process can be implemented with just two n-vectors of 
storage. 


Algorithm 9.2.1. (The Lanczos Algorithm) Given a symmetric 
A € IR'*" and w € IR” having unit 2-norm, the following algorithm com- 
putes a k-by-k symmetric tridiagonal matrix T4 with the property that 
MT) C A(A) Tt assumes the existence of a function A.mult(w) that 
returns the matrix-vector product Aw. The diagonal and subdiagonal ele- 
ments of Ty are stored in a(1:k) and B(1:k — 1) respectively. 


v(l:n) = 0; Bp 21; k 0 
while 5, #0 
ifk £0 
for i = l:n 
t= wi; wi = Vi/ Êk; vi = —Bkt 
end 
end 
v =v + A.mult(w) 
k = k +1; ak = wv; v =v — aw; fy = || v liz 
end 


Note that A is not altered during the entire process. Only a procedure 
A.mult(-) for computing matrix-vector products involving A need be sup- 
plied. If A has an average of about ¿ nonzeros per row, then approximately 
(2i + 8)n flops are involved in a single Lanczos step. 

Upon termination the eigenvalues of Tk can be found using the symmet- 
ric tridiagonal QR algorithm or any of the special methods of $8.5, such as 
bisection. 
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The Lanczos vectors are generated in the n-vector w. If they are desired 
for iater use, then special arrangements must be made for their storage. In 
the typical sparse matrix setting they could be stored on a disk or some 
other secondary storage device until required. 


9.2.2 Roundoff Properties 


The development of a practical, easy-to-use Lanczos procedure requires 
an appreciation of the fundamental error analyses of Paige (1971, 1976, 
1980). An examination of his results is the best way to motivate the several 
modified Lanczos procedures of this section. 

After j steps of the algorithm we obtain the matrix of computed Lanczos 


vectors Qk = [41,..-,@, ] and the associated tridiagonal matrix 
ay By T 0 
fı ae 
T, = 
Ue Bena 
0 Bk-1 Ôk 


Paige (1971, 1976) shows that if f is the computed analog of ry, then 


AQy = Ôk, fuel + Ex (9.2.1) 
where 
| Ex |a = ull A lie. (9.2.2) 


This indicates that the important equation AQ, = QT, +rxe} is satisfied 
to working precision. 

Unfortunately, the picture is much less rosy with respect to the orthog- 
onality among the d; . (Normality is not an issue. The computed Lanczos 
vectors essentially have unit length.) If 6, = fl(|| 7x ||p) and we compute 
Qky1 = flr, /Bk). then a simple analysis shows that Br Geri zu fk + wh 
where || wy |2 = ulj f& ||2 = ul] A |l2. Thus, we may conclude that 


eed © IFZ à; + ull A [lo 
lők] 


for ; = 1:k. In other words, significant departures from orthogonality can 
be expected when fy is small, even in the ideal situation where 7] Qp is 


zero. A small Bre implies cancellation in the computation of fẹ. We stress 
that loss of orthogonality is due to this cancellation and is not the result of 
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the gradual accumulation of roundoff error. 


Example 9.2.1 The matrix 


_ [ 264 ~.48 
ix | —.48 2.36 | 


has eigenvalues 4; = 3 and àz = 2. If the Lanczos algorithm is applied to this matrix 
with q) = [.810, —.586]T and three-digit floating point arithmetic is performed, then 
G2 = [.707, .707 ]T. Loss of orthogonality occurs because span(qi) is almost invariant 
for A. (The vector x = [.8, —.6]T is the eigenvector affiliated with 41.) 


Further details of the Paige analysis are given shortly. Suffice it to 
say now that loss of orthogonality always occurs in practice and with it, 
an apparent deterioration in the quality of T;,’s eigenvalues. This can be 
quantified by combining (9.2.1) with Theorem 8.1.16. In particular, if in 
that theorem we set Ff, = fuel + Ek, X1 = Qk, S = Tk, and assume that 


r — |l QT Qs — K lle 


satisfies T < 1, then there exist eigenvalues 44,,..., uy € A(A) such that 


lu; - Ac(Tk)| S. V2 Ifi ll + Ul Be lle + T2 +7) A lla) 


for i = 1:k. An obvious way to control the 7 factor is to orthogonalize 
each newly computed Lanczos vector against its predecessors. This leads 
directly to our first “practical” Lanczos procedure. 


9.2.3 Lanczos with Complete Reorthogonalization 


Let ro,.--;Tk-1 € IR" be given and suppose that Householder matrices 
Ho,..., Hy. have been computed such that (Ho -+ Hy 4)! (ro,..., T1] 
is upper triangular. Let [qj,...,q9,] denote the first k columns of the 
Householder product (Ho --- Hy). Now suppose that we are given a vec- 
tor r € IR” and wish to compute a unit vector qx; in the direction of 


k 
w = ry - > (Grea € span(qi,...,q)1- 
i=1 
If a Householder matrix H; is determined so (Hp-+- Hx)? [ro,---,7k] is 


upper triangular, then it follows that column (k + 1) of Ho--- Hp is the 
desired unit vector. 

If we incorporate these Householder computations into the Lanczos pro- 
cess, then we can produce Lanczos vectors that are orthogonal to machine 
precision: 
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To = qı (given unit vector) 
Determine Householder Hg so Horo = ei. 


Qu] = qi Aq 

for k =1:n-1 
Tk = (A— akI)qx — Bk-igk-i (Bogo = 0) (9.2.3) 
w = (Hk-1 ones Ho)r, 
Determine Householder H, so Hew = (wy,...,Wk, Bk,0,. ..,0)T. 


gk+1 = Ho:-- Hkekq1; O41 = 04 ,1A0k41 
end 


This is an example of a complete reorthorgonalization Lanczos scheme. A 
thorough analysis may be found in Paige (1970). The idea of using House- 
holder matrices to enforce orthogonality appears in Golub, Underwood, and 
Wilkinson (1972). 

That the computed à; in (9.2.3) are orthogonal to working precision 
follows from the roundoff properties of Householder matrices. Note that by 
virtue of the definition of 9,4, , it makes no difference if 8, = 0. For this 
reason, the algorithm may safely run until k — n — 1. (However, in practice 
one would terminate for a much smaller value of k.) 

Of course, in any implementation of (9.2.3), one stores the Householder 
vectors vy and never explicitly forms the corresponding Pk. Since we have 
Hy(1:k,1:k) = I, there is no need to compute the first k components of 
w = (Hy 1---: Ho)ry, for in exact arithmetic these components would be 
zero. 

Unfortunately, these economies make but a small dent in the computa- 
tional overhead associated with complete reorthogonalization. The House- 
holder calculations increase the work in the kth Lanczos step by O(kn) 
flops. Moreover, to compute qx, 1, the Householder vectors associated with 
Ho,...,H, must be accessed. For large n and k, this usually implies a 
prohibitive amount of data transfer. 

Thus, there is a high price associated with complete reorthogonalization. 
Fortunately, there are more effective courses of action to take, but these 
demand that we look more closely at how orthogonality is lost. 


9.2.4 Selective Orthogonalization 


A remarkable, ironic consequence of the Paige (1971) error analysis is that 
loss of orthogonality goes hand in hand with convergence of a Ritz pair. 
To be precise, suppose the symmetric QR algorithm is applied to Tẹ and 
renders computed Ritz values 0,,...,0y and a nearly orthogonal matrix of 
eigenvectors Sy = ($5,). If Yk = [t,---, k] = fUQk Sk), then it can be 
Shown that for 2 = 1:k we have 


az ul A lle 
18] [Seal 


kei dil © (9.2.4) 
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and 
| Ağ; — 8:9: lo 7 |Bk] | Skil - (9.2.5) 


That is, the most recently computed Lanczos vector 9,4, tends to have a 
nontrivial and unwanted component in the direction of any converged Ritz 
vector. Consequently, instead of orthogonalizing 9,4) against all of the 
previously computed Lanczos vectors, we can achieve the same effect by 
orthogonalizing it against the much smaller set of converged Ritz vectors. 

The practical aspects of enforcing orthogonality in this way are dis- 
cussed in Parlett and Scott (1979). In their scheme, known as selective 
orthogonalization, a computed Ritz pair (ô, ĝ) is called “good” if it satisfies 


| Ag — 89 ll = vul A fle. 


As soon as x41 is computed, it is orthogonalized against each good Ritz 
vector. This is much less costly than complete reorthogonalization, since 
there are usually many fewer good Ritz vectors than Lanczos vectors. 

One way to implement selective orthogonalization is to diagonalize T, at 
each step and then examine the 3,4; in light of (9.2.4) and (9.2.5). A much 
more efficient approach is to estimate the loss-of-orthogonality measure 
ll Ik — AT Qk |? using the following result: 


Lemma 9.2.1 Suppose S4} = [S d] where S € IRP** andde IR". If S 
satisfies || I — STS || € p and |1 — d!d| € 8 then | Ing: — S154 || < 


4+ where 
1 
m = s (u +54 ame aai ) 
Proof. See Kahan and Parlett (1974) or Parlett and Scott (1979). 0 


Thus, if we have a bound for || J, — QTQ, |a we can generate a bound for 
| ei — O74 1Qe41 l2 by applying the lemma with S = Q, and d = G41. 
(In this case ô = u and we assume that 4,4 has been orthogonalized against 
the set of currently good Ritz vectors.) It is possible to estimate the norm 
of Ot Gk 1 from a simple recurrence that spares one the need for accessing 
d1,..., dk. See Kahan and Parlett (1974) or Parlett and Scott (1979). The 
overhead is minimal, and when the bounds signal loss of orthogonality, it is 
time to contemplate the enlargement of the set of good Ritz vectors. Then 
and only then is Tą diagonalized. 


9.2.5 The Ghost Eigenvalue Problem 


Considerable effort has been spent in trying to develop a workable Lanc- 
zos procedure that does not involve any kind of orthogonality enforcement. 
Research in this direction focuses on the problem of “ghost” or “spurious” 
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eigenvalues. These are multiple eigenvalues of 7, that correspond to sim- 
ple eigenvalues of A. They arise because the iteration essentially restarts 
itself when orthogonality to a converged Ritz vector is lost. (By way of 
analogy, consider what would happen during orthogonal iteration §8.2.8 if 
we “forgot” to orthogonalize.) 

The problem of identifying ghost eigenvalues and coping with their pres- 
ence is discussed in Cullum and Willoughby (1979) and Parlett and Reid 
(1981). It is a particularly pressing problem in those applications where all 
of A’s eigenvalues are desired, for then the above orthogonalization proce- 
dures are too expensive to implement. 

Difficulties with the Lanczos iteration can be expected even if A has a 
genuinely multiple eigenvalue. This follows because the T, are unreduced, 
and unreduced tridiagonal matrices cannot have multiple eigenvalues. Our 
next practical Lanczos procedure attempts to circumvent this difficulty. 


9.2.6 Block Lanczos 


Just as the simple power method has a block analog in simultaneous itera- 
tion, so does the Lanczos algorithm have a block version. Suppose n = rp 
and consider the decomposition 


M, BT ofc 0 
B, M ^. : 
Q'AQ-T- in, “Mae i (9.2.6) 
2 Bra 
Ü cw B. M, 


where 
Q —LXmessxe] X; € IR? *? 


is orthogonal, each M; € IRP*?, and each Bj € IR”? is upper triangular. 
Comparing blocks in AQ — QT shows that 


AX, = X4 14BL , + X&My +Xk41Bk  XoBo=0 
for k = 1:r — 1. From the orthogonality of Q we have 
My = XLAX, 
for k = 1:r. Moreover, if we define 
Ry, = AX&Q— X4,My — Xy-1B4., € R"™? 


then X441By = Ry is a QR factorization of Ry. These observations suggest 
that the block tridiagonal matrix T' in (9.2.6) can be generated as follows: 
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X; € IR? given with X1 Xi = J. 

Mı = XT AX, 

for k=1:r—1 (9.2.7) 
Rk = AX, — XkMk — Xk- BE, (X Be = 0) 
Xk+1Bk = Rk (QR factorization of Ry) 
Mya = Xk414Xk+1 


end 


At the beginning of the kth pass through the loop we have 


Al Xi, vs Xp |= | Xi,..., Xk ] dx + Be[0,...,0, Ip) (9.2.8) 
where 
Mı BI id 0 
Bj Mo 
Tk = : 
. s ; Bi, 
Oi. onis Bı M, 


Using an argument similar to the one used in the proof of Theorem 9.1.1, 
we can show that the X, are mutually orthogonal provided none of the Rk 
are rank-deficient. However if rank(R,) < p for some k, then it is possible 
to choose the columns of X44, such that XiaXi = 0, for i = 1:k. See 
Golub and Underwood (1977). 

Because Tj has bandwidth p, it can be efficiently reduced to tridiago- 
nal form using an algorithm of Schwartz (1968). Once tridiagonal form is 
achieved, the Ritz values can be obtained via the symmetric QR algorithm. 

In order to intelligently decide when to use block Lanczos, it is necessary 
to understand how the block dimension affects convergence of the Ritz 
values. The following generalization of Theorem 9.1.3 sheds light on this 
issue. 


Theorem 9.2.2 Let A by an n-by-n symmetric matriz with eigenvalues 
Ay > + 2 Àn and corresponding orthonormal eigenvectors z1,...,z4. Let 
pi > +++ > py be the p largest eigenvalues of the matriz T4 obtained after 
k steps of the block Lanczos iteration (9.2.7). If Z1 = [z1,...,25] and 
cos(05) = as ( Zl X1) > 0, then for i = 1:p, X; Bi > A; — €? where 


E. = (Ay = Ai) tan? (85) Ài = Àp41 


i fi = 
pG OA 


IV 
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and ck—ı(z) is the Chebyshev polynomial of degree k — 1. 
Proof. See Underwood (1975). O 


Analogous inequalities can be obtained for T,’s smallest eigenvalues by 
applying the theorem with A replaced by —A. 

Based on Theorem 9.2.2 and scrutiny of the block Lanczos iteration 
(9.2.7) we may conclude that: 


e the error bound for the Ritz values improve with increased p. 


e the amount of work required to compute T's eigenvalues is propor- 
tional to p°. 


e the block dimension should be at least as large as the largest multi- 
plicity of any sought-after eigenvalue. 


How to determine block dimension in the face of these tradeoffs is discussed 
in detail by Scott (1979). 

Loss of orthogonality also plagues the block Lanczos algorithm. How- 
ever, all of the orthogonality enforcement schemes described above can be 
extended to the block setting. 


9.2.7  s-Step Lanczos 


The block Lanczos algorithm (9.2.7) can be used in an iterative fashion 
to calculate selected eigenalues of A. To fix ideas, suppose we wish to 
calculate the p largest eigenvalues. If X, € JR"*? is a given matrix having 
orthonormal columns, we may proceed as follows: 


until | AX, — X7; [| is small enough 
Generate X5,..., X, € IR"? via the block Lanczos algorithm. 


Form T, = [ Xi,...,Xs]’ A[ Xi,..., X, ], an sp-by-sp, 
p-diagonal matrix. 


Compute an orthogonal matrix U = [ t,---, Usp] such that 
UTT.U = diag(0;, € , Isp) with 01 2 eee > ,p- 


Set X1 = X sess tac) Wigeieeg thy |e 
end 


This is the block analog of the s-step Lanczos algorithm , which has been 
extensively analyzed by Cullum and Donath (1974) and Underwood (1975). 

The same idea can also be used to compute several of A’s smallest eigen- 
values or a mixture of both large and small eigenvalues. See Cullum (1978). 
The choice of the parameters s and p depends upon storage constraints as 
well as upon the factors we mentioned above in our discussion of block 
dimension. The block dimension p may be diminished as the good Ritz 
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vectors emerge. However this demands that orthogonality to the converged 
vectors be enforced. See Cullum and Donath (1974). 


Problems 


P9.2.1 Prove Lemma 9.2.1. 


P9.2.2 If rank(R&) < p in (9.2.7), does it follow that range([ X1,..., X+ ]) contains an 
eigenvector of A? 
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9.3 Applications to Ar = b and Least Squares 


In this section we briefly show how the Lanczos iteration can be embellished 
to solve large sparse linear equation and least squares problems. For further 
details, we recommend Saunders (1995). 


9.3.1 Symmetric Positive Definite Systems 


Suppose A € IR"*" is symmetric and positive definite and consider the func- 
tional ó(x) defined by 


olx) = sa? Az —2™b 


where b € IR". Since Vó(x) = Az-— b, it follows that x = A~'b is the unique 
minimizer of ¢. Hence, an approximate minimizer of ¢ can be regarded as 
an approximate solution to Az — b. 

Suppose zo € IR" is an initial guess. One way to produce a vector se- 
quence {xk} that converges to x is to generate a sequence of orthonormal 
vectors (gy) and to let zy minimize $ over the set 


zo -spaniqg,...,qk) = {Zo +419, c ange: a, € R) 


for k = iin. If Qk = [q,...,qx], then this just means choosing y € IR* 
such that 


5 (to + Qky)! A(zo + Qey) — (zo + Quy)" 


= =v" (QAQ: - y QE (b — Azo) + 6(z0) 


$(zo + Qxy) 


is minimized. By looking at the gradient of this expression with respect to 
y we see that 


Tk = To + QkYk (9.3.1) 
where 


(QE AQk)yr = QT (b — Azo). (9.3.2) 


When k = n the minimization is over all of IR" and so Az, = b. 
For large sparse A it is necessary to overcome two hurdles in order to 
make this an effective solution process: 


@ the linear system (9.3.2) must be “easily” solved. 
e we must be able to compute x, without having to refer to qi,..., Qk 


explicitly as (9.3.1) suggests. Otherwise there would be an excessive 
amount of data movement. 
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We show that both of these requirements are met if the q, are Lanczos 
vectors. 
After k steps of the Lanczos algorithm we obtain the factorization 


AQk = QkTk + Tke (9.3.3) 
where 
a fi 0 
Br a ` 
Tr = QEAQk = TNE EDS l (9.3.4) 
. 008 Pk-a 
0 e PBk-i1 Qk 


With this approach (9.3.2) becomes a symmetric positive definite tridiag- 
onal system which may be solved via the LDL? factorization. (See Algo- 
rithm 4.3.6.) In particular, by setting 


1 0 0 O0 d 0 -> 0 
m 1 0 0 
bg "E ; and Dy = : dz 
: 0 
0 Bk-1 1 0 O d, 


we find by comparing entries in 
T, = L DLF (9.3.5) 
that 
dj =a) 
for 1 = 2:k 
Bi-1 = Bi-i/di-i 
di = o4 — Bi-ypti-a 


end 


Note that we need only calculate the quantities 


Hk-1 = Bk-1/dk-1 


9.3.6 
dy = OK — Éx-ipk-1 ) 


in order to obtain Ly and Dy, from Ly_, and Dz -}. 
As we mentioned, it is critical to be able to compute 2, in (9.3.1) effi- 
ciently. To this end we define Ck € IR?** and px € IR* by the equations 


CLE = Qk 
LiDupe = QE(b— Azo) (9.3.7) 
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and observe that if ro = 5 — Azo then 


te = Io Q&T, Qro = zo-- Qu(Lu4D&LI) |Qiro = zo-- Crepe. 


Let Cy = [e1,...,c& ] be a column partitioning. It follows from (9.3.7) that 


[C1, 444€1-7F Ca, ‘tt, Hk—1Ck—1 + Ck | EN 


and therefore C, = |Ck—1, Cp} where 


Ck = Gk — Bk—1Ck—1- 


Also observe that if we set py = [ 1,.... Pk I: in LDkpk = Qf ro, then 
that equation becomes 


where 


pk = (qx ro an Hk--1dk ipi) /dx 
and thus, 


Tk = Xod-Ckpk = Tot Ck-1Pk—-1 + Pkk = Sk-1 + PkCk- 


This is precisely the kind of recursive formula for x; that we need. To- 
gether with (9.3.6) and (9.3.7) it enables us to make the transition from 
(qk—1, Ck—1; Zk—1) to (qx, cx, zx) with a minimal work and storage. 

A further simplification results if we set qı to be a unit vector in the 
direction of the initial residual rg = b— Azo. With this choice for a Lanczos 
starting vector, qi ro = 0 for k > 2. It follows from (9.3.3) that 


b— Ax, = b—A(To + Qkyk) = ro — (QkTk + TkEk Uk 


T ; S T 
= T9— Qk&Qiro —TRERY = —Tk€k Y- 
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Thus, if 6, = || rx |a = 0 in the Lanczos iteration, then Az, = b. Moreover, 
| Az; —b||, = flet yx| and so estimates of the current residual can be 
obtained as a by-product of the iteration. Overall, we have the following 
procedure. 


Algorithm 9.3.1 If A € IR?*" is symmetric positive definite, b € JR", and 
y p 


zo € IR" is an initial guess (Arp = b), then this algorithm computes the 
solution to Az = b. 


ro = b — Azo 


fo = i To ll 
qo = 0 
k=0 
while Êk X 0 
Qk--1 = Tk / Dk 
k=k+1 
Ok = q} Aq 
rk = (A — onl)an — Be-19k-1 
Bk = || rr lle 
ifk=1 
dy = à] 
c =q 
pı = o/o 
T1 = ~141 
else 


Hk-1 = Bk-1/dk-1 
dk = Ok — Dk-ik-i 
Ck = Qk — Hk-1ĉk-1 
Pk = —Hk-1dk-1pk-1/dk 
Tk = Tk—1 + DkCk 
end 
end 
T = Tk 


This algorithm requires one matrix-vector multiplication and a couple of 
saxpy operations per iteration. The numerical behavior of Algorithm 9.3.1 
is discussed in the next chapter, where it is rederived and identified as the 
widely known method of conjugate gradients. 


9.3.2 Symmetric Indefinite Systems 


A key feature in the above development is the idea of computing the LDL? 
factorization of the tridiagonal matrices Ty. Unfortunately, this is poten- 
tially unstable if A, and consequently Tx, is not positive definite. A way 
around this difficulty proposed by Paige and Saunders (1975) is to develop 


494 CHAPTER 9. LANCZOS METHODS 


the recursion for x, via an “LQ” factorization of Tk. In particular, at the 


kth step of the iteration, we have Givens rotations J),..., J,—1 such that 
di 0 0 "num "P ore 0 
€l da 0 zm ENT e^ Ria 0 


Fi ma^ dy eee. a a 7 


"y i : 0 
0 0 O + frig exi dk 
Note that with this factorization ry is given by 
Tk = To Qxyk = QT, QEO = Was 


where 
Wy, = Qidi e Jg; E R^ 


and sy € IR* solves 
Lise = Qb. 


Scrutiny of these equations enables one to develop a formula for computing 
rk from ry-,; and an easily computed multiple of w;, the last column of 
Wi. This defines the SYMMLQ method set forth in Paige and Saunders 
(1975). 

A different idea is to notice from (9.3.3) and the definition Bkqk41 = Tk 
that 


AQx = QXTX + Bkgk+1€f = Qe i Hk 


T; 
Hy = : 
: Bei | 
This (k + 1)-by-k matrix is upper Hessenberg and figures in the MINRES 
method of Paige and Saunders (1975). In this technique rj minimizes 
| Ax — b ||, over the set zo + span(qi,....gx). Note that 


| A(zo + Qky) — ||, | AQxy — (b — Azo) ll; 
| Qiii Hry — (b — Azo) l5 = || Hky — Boer lle 


where it is assumed that gq, = (6—Azq)/ Gp is a unit vector. Asin SYMMLQ, 
it is possible to develop recursions that permit the efficient computation of 
x, from its predecessor r,-;. The QR factorization of H, is involved. 

The behavior of the conjugate gradient method is detailed in the next 
chapter. The convergence of SYMMLQ and MINRES is more complicated 
and is discussed in Paige, Parlett, and Van Der Vorst (1995). 


where 


I 
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9.3.3 Bidiagonalization and the SVD 
Suppose UT AV = B represents the bidiagonalization of A € IR"*^ with 


U = [u1,...,tm] UTU = In 
V = [v.n] VTV = Ín 
and 
a, fh e: 0 
0 ay EC : 
s E fa-1 
0 ete 0 An 


Recall from §5.4.3 that this factorization may be computed using House- 
holder transformations and that it serves as a front end for the SVD algo- 
rithm. 

Unfortunately, if A is large and sparse, then we can expect large, dense 
submatrices to arise during the Householder bidiagonalization. Conse- 
quently, it would be nice to develop a means for computing B directly 
without any orthogonal updates of the matrix A. 

Proceeding just as we did in §9.1.2 we compare columns in the equations 
AV = UB and ATU = VB? for k = l:n and obtain 


Av, = Qkük + Dk-1uk- ouo = 0 
: k (9.3.9) 
A uk = Ove + ÊkUk+ Paint = 0 
Defining 
Tk = Avy — fk-1Uk-1 
Per = Alu, — akk 


we may conclude from orthonormality that a, = || ry ||, uk = Tk/@k, 
Pk = +|| px l2, and vy41 = px/fk. Properly sequenced, these equations 
define the Lanczos method for bidiagonalizing a rectangular matrix: 


vı = given unit 2-norm n-vector 
Po = vj; fo = 1; k 20; ug =O 


while 6, #0 
Uk+1 = Pk/ Pk 
k=k+1 
Tk = Avy — Dk 1Uk-1 (9.3.10) 
ak = || rx fle 
Uk =Tk/Qk 
pk = AT uy — QkUk 
Bk = M px lle 


end 
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If rank(A) = n, then we can guarantee that no zero a, arise. Indeed, if 
a, = 0 then span{ Áv, ..., Ávk] C span(ui;,..., ux-1) which implies rank 
deficiency. 

If 8, = 0, then it is not hard to verify that 


A [93,5 vk] = [u1,..., ux] Bk 


AT [u1,..., X] za Uic Ue] Bk 


where B, = B(1:k,1:k) and B is prescribed by (9.3.8). Thus, the v vectors 
and the u vectors are singular vectors and a(B&) C o(A). Lanczos bidiag- 
onalization is discussed in Paige (1974). See also Cullum and Willoughby 
(1985a, 1985b). It is essentially equivalent to applying the Lanczos tridiag- 
onalization scheme to the symmetric matrix 


0 A 
c= ar al 


We showed that AÀ;(C) = o;({A) = —àÀn4m-i41(C) for i = 1:n at the 
beginning of §8.6. Because of this, it is not surprising that the large singular 
values of the bidiagonal matrix tend to be very good approximations to the 
large singular values of A. The small singular values of A correspond to the 
interior eigenvalues of C and are not so well approximated. The equivalent 
of the Kaniel-Paige theory for the Lanczos bidiagonalization may be found 
in Luk (1978) as well as in Golub, Luk, and Overton (1981). The analytic, 
algorithmic, and numerical developments of the previous two sections all 
carry over naturally to the bidiagonalization. 


9.3.4 Least Squares 


The full-rank LS problem min || Az — 6 ||; can be solved via the bidiago- 
nalization. In particular, 


n 
tus = Vyus = Y yivi 
i=1 


where y = [y1,---,Yn]’ solves the system By = ([uTb,...,uIb|T. Note 
that because B is upper bidiagonal, we cannot solve for y until the bidi- 
agonalization is complete. Moreover, we are required to save the vectors 
U1,--+,U,, an unhappy circumstance if n is large. 

The development of a sparse least squares algorithm based on the bidi- 
agonalization can be accomplished more favorably if A is reduced to lower 
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bidiagonal form 


a, D 0 
Bi o2 

UTAV = B=|: Th, he, 0 
0 LES 
0 0 Ên 


Ü 


where V =[v,...,Un] and U = [ui,..., um ] are orthogonal. Comparing 
columns in the equations ATU = V BT and AV = UB we obtain 


ATuk = pk-1Vk-1 + QkUk Povo = 0 
Avy = aux + Bkuki 
It is straightforward to develop a Lanczos procedure from these equations 
and the resulting algorithm is very similar to (9.3.10), only u; is the starting 
vector. 

Define the matrices Vy = [vi,..., v], Ux = [ua,.-., uk], and By = 
B(1:k +1, 1:k) and observe that AV, = Uy, 1 Bk. Our goal is to compute Tk, 
the minimizer of || Az — b ||, over all vectors of the form z = zo + Vy, where 
y € IR* and zp € IR? is an initial guess. lf u; = (b— Azo)/|| b — Azo ||2, then 


A(xo + Vey) - b = Ug) iBky — fiUga 16) = Uk (Bey — Bier) 


where e, = [,,(:,1). It follows that if yy solves the (k + 1)-by-k lower 
bidiagonal LS problem 


min || Bkiiy — £iei le 


then ry = £o + V&yk. Since B, is lower bidiagonal, it is easy to compute 
Givens rotations Jj,..., Jy such that 


R k 
Jk: J By = | J 1 


is upper bidiagonal. If 


d k 
Jo AUR M i? 


then it follows that 2, = zo + Vkyk = Wid, where Wy = Vm. Paige 
and Saunders (1982a) show how x, can be obtained from zı via a simple 
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recursion that involves the last column of W,. The net result is a sparse LS 
algorithm referred to as LSQR that requires only a few n-vectors of storage 
to implement. 


Problems 


P9.3.1 Modify Algorithm 9.3.1 so that it implements the indefinite symmetric solver 
outlined in $9.3.2. 


P9.3.2 How many vector workspaces are required to implement efficiently (9.3.10)? 


P9.3.3 Suppose A is rank deficient and a; = 0 in (9.3.10). How could ug be obtained 
so that the iteration could continue? 


P9.3.4 Work out the lower bidiagonal version of (9.3.10) and detail the least square 
solver sketched in 89.3.4. 


Notes and References for Sec. 9.3 


Much of the material in this section has been distilled from the following papers: 


C.C. Paige (1974). "Bidiagonalization of Matrices and Solution of Linear Equations," 
SIAM J. Num. Anal. 11, 197-209. 

C.C. Paige and M.A. Saunders (1975). "Solution of Sparse Indefinite Systems of Linear 
Equations,” SIAM J. Num. Anal. 12, 617-29. 

C.C. Paige and M.A. Saunders (1982a). “LSQR: An Algorithm for Sparse Linear Equa- 
tions and Sparse Least Squares,” ACM Trans. Math. Soft. 8, 43-71. 

C.C. Paige and M.A. Saunders (1982b). “Algorithm 583 LSQR: Sparse Linear Equations 
and Least Squares Problems,” ACM Trans. Math. Soft. 8, 195-209. 

M.A. Sanders (1995). “Solution of Sparse Rectangular Systems,” BIT 35, 588-604. 


See also Cullum and Willoughby (19855a,1985b) and 


O. Widlund (1978). “A Lanczos Method for a Class of Nonsymmetric Systems of Linear 
Equations,” SIAM J. Numer. Anal. 15, 801-12. 

B.N. Parlett (1980). “A New Look at the Lanczos Algorithm for Solving Symmetric 
Systems of Linear Equations,” Lin. Alg. and Its Applic. 29, 323-46. 

G.H. Golub, F.T. Luk, and M. Overton (1981). *A Block Lanczos Method for Computing 
the Singular Values and Corresponding Singular Vectors of a Matrix," ACM Trans. 
Math. Soft. 7, 149-69. 

J. Cullum, R.A. Willoughby, and M. Lake (1983). “A Lanczos Algorithm for Computing 
Singular Values and Vectors of Large Matrices,” SIAM J. Sci. and Stat. Comp. 4, 
197-215. 

Y. Saad (1987). “On the Lanczos Method for Solving Symmetric Systems with Several 
Right Hand Sides," Math. Comp. 48, 651-662. 

M. Berry and G.H. Golub (1991). “Estimating the Largest Singular Values of Large 
Sparse Matrices via Modified Moments,” Numerical Algorithms 1, 353-374. 

C.C. Paige, B.N. Parlett,and H.A. Van Der Vorst (1995). “Approximate Solutions and 
Eigenvalue Bounds from Krylov Subspaces,” Numer. Linear Algebra with Applic. 2, 
115-134. 
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9.4 Arnoldi and Unsymmetric Lanczos 


If A is not symmetric, then the orthogonal tridiagonalization QT AQ = T 
does not exist in general. There are two ways to proceed. The Arnoldi 
approach involves the column-by-column generation of an orthogonal Q 
such that QT AQ = H is the Hessenberg reduction of §7.4. The unsym- 
metric Lanczos approach computes the columns of Q = [qi,...,q4] and 
P = [pi,..., pa] so that PT AQ = T is tridiagonal and PTQ = In. Both 
methods are interesting as large sparse unsymmetric eigenvalue solvers and 
both can be adapted for sparse unsymmetric Ax = b solving. (See §10.4.) 


9.4.1 The Basic Arnoldi Iteration 


One way to extend the Lanczos process to unsymmetric matrices is due to 
Arnoldi (1951) and revolves around the Hessenberg reduction QT AQ = H. 
In particular, if Q = [qi,...,G, ] and we compare columns in AQ = QH, 


then 
k+1 


Aq, = >> hing 1<k<n-l. 


;—1 
Isolating the last term in the summation gives 


k 
hk+1,kgk+1 = Age — 3 hinds: = Tk 
i-1 
where hj, = qf Aq, for i = 1:k. It follows that if rą # 0, then qk+1 is 
specified by 
Qk41 = Tk/Ak+1,k 


where hi41,& = || 7x ||5. These equations define the Arnoldi process and in 
strict analogy to the symmetric Lanczos process (9.1.3) we obtain : 


To = qi 
hig =1 
k=0 


while (Ags 1,k T 0) 
Qk1 = Tk/ hk ik 


k=k+1 
Tk = Aqx (9.4.1) 
for i = 1:k 
ha = q7 w 
rk = Tk — hiki 
end 


hk+1,k = Il re lle 
end 
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We assume that qj is a given unit 2-norm starting vector. The gy are called 
the Arnoldi vectors and they define an orthonormal basis for the Krylov 
subspace K( A, q1, k): 


span{q, $ deg qk} - spaniqi, Aq, et , A*71q1). (9.4.2) 


The situation after k steps is summarized by the k-step Arnold: factoriza- 
tion 


AQ, = Qi Hy + rpez (9.4.3) 
where Qk = [g,. .., ql, ex = Ik(:, k), and 
hi hi >> -> Aa 
hir Ji aai RO à 
Hy — O hz 
Ü: we 25 MUEs Aek 


If r = 0, then the columns of Q, define an invariant subspace and A( H4) C 
A(A). Otherwise, the focus is on how to extract information about A's 
eigensystem from the Hessenberg matrix Hk and the matrix Qk of Arnoldi 
vectors. 

Ify E€ IR* is a unit 2-norm eigenvector for Hk and Hy = Ay, then from 
(9.4.3) 

(A — Az = (ei v) 
where r = Qxyy. We call A a Ritz value and x the corresponding Ritz 
vector. The size of |e y||| rx ||; can be used to obtain error bounds, although 
the relevant perturbation theorems are not as routine to apply as in the 
symmetric case. 

Some numerical properties of the Arnoldi iteration are discussed in 
Wilkinson (1965, pp.382). As with the symmetric Lanczos iteration, loss 
of orthogonality among the q; is an issue. But two other features of (9.4.1) 
must be addressed before a practical Arnoldi eigensolver can be obtained: 


e The Arnoldi vectors q1,...,q, are referenced in step k and the com- 
putation of Hy(1:k,k) involves O(kn) flops. Thus, there is a steep 
penalty associated with the generation of long Arnoldi sequences. 


e The eigenvalues of H, do not approximate the eigenvalues of A in the 
style of Kaniel and Paige. This is in contrast to the symmetric case 
where information about A’s extremal eigenvalues emerges quickly . 
With Arnoldi, the early extraction of eigenvalue information depends 
crucially on the choice of q4. 


These realities suggest a framework in which we use Arnoldi with repeated, 
carefully chosen restarts and a controlled iteration maximum. (Recall the 
s-step Lanczos process of $9.2.7.) 
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9.4.2 Arnoldi with Restarting 


Consider running Arnoldi for m steps and then restarting the process with 
a vector g, chosen from the span of the Arnoldi vectors q1,- --, gm- Because 
of the Krylov connection (9.4.2), q} has the form 


q+ = p( Agi 


for some polynomial of degree m — 1. If Av; = Av; for i = 1:n and q, has 
the eigenvector expansion 


gi = aiv Td Ggn Un, 


then 
q+ = ayp(Ai)ui +- + anplàn Wn. 


Note that K(A, q+}, m) is rich in eigenvectors that are emphasized by p(A). 
That is, if p(Auanted) is large compared to p{Aunwanted),; then the Krylov 
space (A, q+, m) will have much better approximations to the eigenvector 
Zwanted than to the eigenvector unwanted: (It is possible to couch this 
argument in terms of Schur vectors and invariant subspaces rather than in 
terms of particular eigenvectors.) 

Thus the act of picking a good restart vector q4} from K(À, q1, m) is the 
act of picking a polynomial "filter" that tunes out unwanted portions of the 
spectrum. Various heuristics for doing this have been developed based on 
computed Ritz vectors. See Saad (1980, 1984, 1992). 

We describe a method due to Sorensen (1992) that determines the 
restart vector implicitly using the QR iteration with shifts. The restart 
occurs after every m steps and we assume that m > j where 7 is the num- 
ber of sought-after eigenvalues. The choice of the Arnoldi length parameter 
m depends on the problem dimension n, the effect of orthogonality loss, and 
system storage constraints. 

After m steps we have the Arnoldi factorization 


AQ. = Q.H. + reel, 


where Q. € IR**"" has orthonormal columns, H, € IR™*™ is upper Hessen- 
berg, and Q7r, = 0. The subscript “c” stands for “current.” The QR 
iteration with shifts is then applied to He: 


H) = H, 
for i = l:p 
H® — pI = VR; 
HG+) = R,Vi + pil 
end 
H, = H+») 
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Here p = m — j and it is assumed that the implicitly shifted QR process of 
67.5.5 is applied. The selection of the shifts will be discussed shortly. 
The orthogonal matrix V = V,---V, has three crucial properties: 


(1) H} = VT H,V. This is because V HV, = HOt), 


(2) [V]; = 0 for i = 1:7 — 1. This is because each V; is upper Hessenberg 
and so V € R™*™ has lower bandwidth p = m — j. 


(3) The first column of V has the form 
Vey = a(H. — ppl) He — ng-iD)- (He — pier (9.4.4) 
where a is a scalar. 
To be convinced of property (3), consider the p — 2 case: 


VR;R, = W(V;Rj)R, = Vi(H® — u5D)R 
VW (VI HO v, — i4D)R, = (HO — mI)V Ri 
(HU? — i4) (HU) — pI) = (He — uaD((He — pal). 


Since RaR, is upper triangular, the first column of V = Vi Vz is a multiple 
of (He — pal)(He — wnt). 

We now show how to restart the Arnoldi process using the matrix V to 
implicitly select the new starting vector. From (1) we obtain the following 
transformation of (9.4.3): 


AQ, = Q,H, + rel V 


where Q+ = Q.V. This is not a new length-m Arnoldi factorization because 
eT V is not a multiple of eZ. However, in view of property (2), 


AQ+(:, 1:3) = Q+ (:, 1:9) H4 (13, 1:7) t UmjfcC; (9.4.5) 


is a length-7 Arnoldi factorization. By “jumping into" the basic Arnoldi 
iteration at step j7+1 and performing p steps, we can extend (9.4.5) to a new 


length-m Arnoldi factorization. Moreover, using property (3) the associated 
(new) 


starting vector qi = Q(:,1) has the following characterization: 
Qí(,1) = Q.Ve = oQ.(H. - usI):-- (Hc — paler 
= o(A- ppl) +- (Ae-mI)Qe — (9.46) 
The last equation follows from the identity 
(A - wl)Qe = Qe(He — pI) + rem 


and the fact that eT f(H,)e, = 0 for any polynomial f(-) of degree p — 1 or 
less. 


9.4, ARNOLDI AND UNSYMMETRIC LANCZOS 503 


Thus, q^ = p(A)qı where p(A) is the polynomial 


p(A) = (A — p41) (A — p2) -- (A — Hp). 


This shows that the shifts are the zeros of the filtering polynomial. One 
interesting choice for the shifts is to compute A(H,) and to identify the 
eigenvalues of interest À;,...,A;: 


MH.) = Du riley Aj) U (ji, oes Auk 


Setting 4; = Aij for i = 1:p is one way of generating a filter polynomial 
that de-emphasizes the unwanted portion of the spectrum. 

We have just presented the rudiments of the implicitly restarted Arnoldi 
method. It has many attractive attributes. For implementation details and 
further analysis, see Lehoucq and Sorensen (1996) and Morgan (1996). 


9.4.3 Unsymmetric Lanczos Tridiagonalization 


‘Another way to extend the symmetric Lanczos process is to reduce A 
to tridiagonal form using a general similarity transformation. Suppose 
A € IR'*" and that a nonsingular matrix Q exists so 


oa ^n ss 0 

Bi a 
Q'AQ = T = 
. : "In-1 
Une Bn-1 On 
With the column partitionings 
Q = [a-a] 
QT = P= [51-92] 


we find upon comparing columns in AQ= QT and AT P = PTT that 
Ágk = "yk-idk-1i  OkGk + ÜkQk4l Yoqa = 0 
AT py = fk-1Pk-1 + OkPk + YkPk+1 Bopo = 0 


for k = 1:n—1. These equations together with the biorthogonality condition 
PTQ = In imply 


Ok = PAG 
and 
BxQky1 = Tk = (A- okI)dk — Ve-19k-1 
YkPk+1 = Se = (A — axl)" Pk — Bk-1pPk—a1- 
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There is some flexibility in choosing the scale factors By and yy. Note that 
1 = pea r9k+1 = (3k/ Yk) (rk/Bk)- 
It follows that once f, is specified +, is given by 
Yk = Sk Tk / Bk- 
With the “canonical” choice £y = || ry |l3 we obtain 


qı, pı given unit 2-norm vectors with p? qı x 0. 


k=0 
go = 0; ro = d 
Po = 0; 3% = pı 
while (ry #0) A (sy #0) A (Trg £ 0) 
Bk = || rk lle 
Yk = SETk/ Be 
qk+1 = Tk/ Br 
Pk41 = Sk/^fk 
k=k+1 (9.4.7) 
Qk = PE Ade 


Tk = (A-—agl)qk — Ye-19%k-1 
Sk = (A — al) Pk — Bk-1Pk-1 


end 
If 
Qi Y1 0 
B1 o2 
Tk = , 
sto 0t Yks 
QO e Bk-1 Qk 


then the situation at the bottom of the loop is summarized by the equations 


A[q....dm] = [a.....dx] Tk  rxex (9.4.8) 
AT [pi, ..., px] [Pi PR] TE. + seep. (9.4.9) 


If ry = 0, then the iteration terminates and span(q;,...,q,) is an invari- 
ant subspace for A. If sy, = 0, then the iteration also terminates and 
spanípi,...,px) is an invariant subspace for AT. However, if neither of 
these conditions are true and sir, = 0, then the tridiagonalization process 
ends without any invariant subspace information. This is called serious 


breakdown. See Wilkinson (1965, p.389) for an early discussion of the mat- 
ter. 
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9.4.4 The Look-Ahead Idea 


It is interesting to look the serious breakdown issue in the block version 
of (9.4.7). For clarity assume that A c IR?** with n = rp. Consider the 
factorization 


M, CT 0 
Bj Mo : 
PT AQ = OE EE (9.4.10) 
: OT. 
0 bad Bla M, 


where all the blocks are p-by-p. Let Q = [Qi,...,Qy | and P = [P,,..., Pr] 
be conformable partitionings of Q and P. Comparing block columns in the 
equations AQ = QT and AT P = PTT we obtain 


Qk+1Bk = AQk - Qu Mk - Qu iCT. , 
Pk410k = ATRE-PRMMI-BLOQBL, 


Ry 
Sk 


ui 


Note that My = PT AQ,. If SI Ry € IR?”? is nonsingular and we compute 
By, Ce € IRP*? so that 
CT Bp = ST Ry, 


then 


Qui = RB, (9.4.11) 
Peat = S.C, (9.4.12) 


satisfy PE QK41 = I,. Serious breakdown in this setting is associated with 
having a singular ST Ry. 

One way of solving the serious breakdown problem in (9.4.7) is to go 
after a factorization of the form (9.4.10) in which the block sizes are dynam- 
ically determined. Roughly speaking, in this approach matrices Qx4) and 
P,4, are built up column by column with special recursions that culminate 
in the production of a nonsingular PZ, 41941. The computations are ar- 
ranged so that the biorthogonality conditions PT Qk4+1 = 0 and QT B,,, =0 
hold for i = 1:k. 

A method of this form belongs to the family of look-ahead Lanczos 
methods. The length of a look-ahead step is the width of the Qk+1 and P41 
that it produces. If that width is one, a conventional block Lanczos step 
may be taken. Length-2 look-ahead steps are discussed in Parlett, Taylor 
and Liu (1985). The notion of incurable breakdown is also presented by these 
authors. Freund, Gutknecht, and Nachtigal (1993) cover the general case 
along with a host of implementation details. Floating point considerations 
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require the handling of “near” serious breakdown. In practice, each My that 
is 2-by-2 or larger corresponds to an instance of near serious breakdown. 


Problems 


P9.4.1 Prove that the Arnoldi vectors in (9.4.1) are mutually orthogonal. 
P9.4.2 Prove (9.4.4). 
P9.4.3 Prove (9.4.6). 


P9.4.4 Give an example of a starting vector for which the unsymmetric Lanczos iteration 
(9.4.7) breaks down without rendering any invariant subspace information. Use 


1 6 2 
A= 3 0 2 
1 3 5 


P9.4.5 Suppose H € R”*™ is upper Hessenberg. Discuss the computation of a unit 
upper triangular matrix U such that HU = UT where T is tridiagonal. 


P9.4.6 Show that the QR algorithm for eigenvalues does not preserve tridiagonal struc- 
ture in the unsymmetric case. 


Notes and References for Sec. 9.4 
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Chapter 10 


Iterative Methods for 
Linear Systems 


§10.1 The Standard Iterations 

§10.2 The Conjugate Gradient Method 
810.3 Preconditioned Conjugate Gradients 
810.4 Other Krylov Subspace Methods 


We concluded the previous chapter by showing how the Lanczos it- 
eration could be used to solve various linear equation and least squares 
problems. The methods developed were suitable for large sparse problems 
because they did not require the factorization of the underlying matrix. In 
this section, we continue the discussion of linear equation solvers that have 
this property. 

The first section is a brisk review of the classical iterations: Jacobi, 
Gauss-Seidel, SOR, Chebyshev semi-iterative, and so on. Our treatment of 
these methods is brief because our principal aim in this chapter is to high- 
light the method of conjugate gradients. In §10.2, we carefully develop this 
important technique in a natural way from the method of steepest descent. 
Recall that the conjugate gradient method has already been introduced via 
the Lanczos iteration in §9.3. The reason for deriving the method again is 
to motivate some of its practical variants, which are the subject of §10.3. 
Extensions to unsymmetric problems are treated in §10.4. 

We warn the reader of an inconsistency in the notation of this chapter 
In 810.1, methods are developed at the “(i, j) level” necessitating the use of 
superscripts: ze) denotes the i-th component of a vector z(^. In the other 
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sections, however, algorithmic developments can proceed without explicit 
mention of vector/matrix entries. Hence, in §10.2-§10.4 we dispense with 
superscripts and denote vector sequences by {zx}. 


Before You Begin 


Chapter 1, §§2.1-2.5, and §2.7, Chapter 3, and §§4.1-4.3 are assumed. 
Other dependencies include: 


Chapter 9 
l 
810.1 - §10.2 -ə $8103 -»  §10..4 
f 
87.4 


Texts devoted to iterative solvers include Varga (1962), Young (1971), 
Hageman and Young (1981), and Axelsson (1994). The software "tem- 
plates" volume by Barrett et al (1993) is particularly useful. The direct 
(non-iterative) solution of large sparse systems is sometimes preferred. See 
George and Liu (1981) and Duff, Erisman, and Reid (1986). 


10.1 The Standard Iterations 


The linear equation solvers in Chapters 3 and 4 involve the factorization 
of the coefficient matrix A. Methods of this type are called direct methods. 
Direct methods can be impractical if A is large and sparse, because the 
sought-after factors can be dense. An exception to this occurs when A is 
banded (cf. §4.3). Yet in many band matrix problems even the band itself 
is sparse making algorithms such as band Cholesky difficult to implement. 

One reason for the great interest in sparse linear equation solvers is the 
importance of being able to obtain numerical solutions to partial differ- 
ential equations. Indeed, researchers in computational PDE’s have been 
responsible for many of the sparse matrix techniques that are presently in 
general use. 

Roughly speaking, there are two approaches to the sparse Az = b prob- 
lem. One is to pick an appropriate direct method and adapt it to exploit 
A’s sparsity. Typical adaptation strategies involve the intelligent use of 
data structures and special pivoting strategies that minimize fill-in. 

In contrast to the direct methods are the iterative methods. These meth- 
ods generate a sequence of approximate solutions {x‘*)} and essentially 
involve the matrix A only in the context of matrix-vector multiplication. 
The evaluation of an iterative method invariably focuses on how quickly the 
iterates +‘) converge. In this section, we present some basic iterative meth- 
ods, discuss their practical implementation, and prove a few representative 
theorems concerned with their behavior. 
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10.1.1 The Jacobi and Gauss-Seidel Iterations 


Perhaps the simplest iterative scheme is the Jacobi iteration. It is defined 
for matrices that have nonzero diagonal elements. The method can be 
motivated by rewriting the 3-by-3 system Az = b as follows: 


Ti = (bi — &)2%2 — 01323)/a11 
X2 = (b5 ~ 42912) — 12313)/022 
Z3 = (bg — agiz1 — a322%2)/a33 


Suppose z^ is an approximation to x = A^ !b. A natural way to generate 
a new approximation z**! is to compute 


ern (bi = aya)” = arazi?) /an 


Ti = 
(RER (bo = ayr” e 0351 )/a23 (10.1.1) 
oo” = (bg — aac? — a35109)/ 133 


This defines the Jacobi iteration for the case n — 3. For general n we have 


for i = 1:n 


i-1 n 
zt? = b; = Y agi = > ajs” Jas (10.1.2) 
j=1 j=i+1 
end 


Note that in the Jacobi iteration one does not use the most recently avail- 
able information when computing 'U For example, z is used in the 
calculation of oft) even though component g(k+}) is known. If we revise 
the Jacobi iteration so that we always use the most current estimate of the 


exact x; then we obtain 


for i = l:n 


i-1 n 
gtt) = bj = 3 agr t! = > aie” E (10.1.3) 
j=l j=itl 
end 
This defines what is called the Gauss-Seidel iteration. 
For both the Jacobi and Gauss-Seidel iterations, the transition from 


x’) to 2+) can be succinctly described in terms of the matrices L, D, 
and U defined by: 


0 0 0 

ay, 0 
L = 31 432 0 
0 0 
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D = diag(ai1,-...,ann) (10.1.4) 
Oaa Hes exe Og 
0 O0 
U = |p 0 ee 
: Q5 —1,n 
0 0 -.. O0 0 


In particular, the Jacobi step has the form M;z(**9 = Nrt) + b where 
Mz = Dand N; = —(L--U). On the other hand, Gauss-Seidel is defined 
by Maz **? = Noz(9 +b with Mg = (D + L) and Ng = -U. 


10.1.2  Splittings and Convergence 


The Jacobi and Gauss-Seidel procedures are typical members of a large 
family of iterations that have the form 


Ma(**9 = Na(9 +b (10.1.5) 


where A = M —N isa splitting of the matrix A. For the iteration (10.1.5) 
to be practical, it must be "easy" to solve a linear system with M as the 
matrix. Note that for Jacobi and Gauss-Seidel, M is diagonal and lower 
triangular respectively. 

Whether or not (10.1.5) converges to z = A~!b depends upon the eigen- 
values of M^! N. In particular, if the spectral radius of an n-by-n matrix 
G is defined by 

p(G) = max( |A]: à € A(G) }, 


then it is the size of p(M—'N) is critical to the success of (10.1.5). 


Theorem 10.1.1 Suppose b € R” and A= M— N € IR"** is nonsingu- 
lar. If M is nonsingular and the spectral radius of M~'N satisfies the 
inequality p(M-!N) < 1, then the iterates 1 defined by Mr®*+)) = 
Na +b converge to x = A-!b for any starting vector 2. 


Proof. Let e) = z(4) — z denote the error in the kth iterate. Since Mz 
—Nzr-bit follows that M(z(**9 —z) = N(z(9? — z), and thus, the error in 
gt U is given by e+) = M-1Nel) = (M-1N)*tle(Ü. From Lemma 
7.3.2 we know that (M~1N)* — 0 iff (M^ !N) <1.0 


This result is central to the study of iterative methods where algorithmic 
development typically proceeds along the following lines: 


e A splitting A = M — N is proposed where linear systems of the form 
Mz — d are “easy” to solve. 
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e Classes of matrices are identified for which the iteration matrix G = 
M~!N satisfies p(G) < 1. 


e Further results about p(G) are established to gain intuition about 
how the error eff) tends to zero. 


For example, consider the Jacobi iteration, Dz**9 = -(L + U)z(9 + p, 
One condition that guarantees o(M ; ! Nj) < 1 is strict diagonal dominance. 
Indeed, if A has that property (defined in 53.4.10), then 


p(M;!N,) € | DO(L+U) llo = max 2 m 


l<ign 


Gii 


Usually, the “more dominant” the diagonal the more rapid the convergence 
but there are counterexamples. See P10.1.7. 

A more complicated spectral radius argument is needed to show that 
Gauss-Seidel converges for symmetric positive definite A. 


Theorem 10.1.2 If A € R"*" is symmetric and positive definite, then the 
Gauss-Seidel iteration (10.1.9) converges for any z (0) 


Proof. Write A = L+ D+ L^ where D = diag(a;;) and L is strictly lower 
triangular. In light of 'Theorem 10.1.1 our task is to show that the matrix 


G = —(D + L)-1LT has eigenvalues that are inside the unit circle. Since 
D is positive definite we have Gi = DV/?GD-V? = -(I4 DL) !L1, 
where L} = D-V?[,p-V72, Since G and G, have the same EDU, 
we must EH that p(G,) < 1. If Giz = Ax with zÉz = 1, then we 
have —LTz = A(I + Li)z and thus, -zLTz. = A(1+ zë Lız). Letting 
a+bi= sii we have 
AP = | -a+ bi |? _ a? +b? 
l+a+ bi 1+ 2a4 a? + 62 © 


However, since D-1/7AD-1/2 = I + Lı + LT is positive definite, it is not 
hard to show that 0 < 1--zÉ Liz z"LTz = 1+42a implying |A| < 1.0 


This result is frequently applicable because many of the matrices that arise 
from discretized elliptic PDE’s are symmetric positive definite. Numerous 
other results of this flavor appear in the literature. 


10.1.3 Practical Implementation of Gauss-Seidel 


We now focus on some practical details associated with the Gauss-Seidel 
iteration. With overwriting the Gauss-Seidel step (10.1.3) is particularly 
simple to implement: 
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for i 2 l:n 
i-1 n 
X = bi =- ò AijTj — ; QijTj Qii 
j=1 j=itl 


This computation requires about twice as many flops as there are nonzero 
entries in the matrix A. It makes no sense to be more precise about the 
work involved because the actual implementation depends greatly upon the 
structure of the problem at hand. 

In order to stress this point we consider the application of (10.1.3) to 
the N M-by-N M block tridiagonal system 


end 


T -in Pus 0 91 fi 
—In T E 92 fa 
, : ; : = (10.1.6) 
-Iy 
0 —In T 9M fu 
where 
Ao 22] e 0 G(1,j) F(1,j) 
a 4 “s G(2, j) F(2, j) 
T= COME NS |j = : |j : 
. uv  -—]1 : 
0 oe. —] 4 G(N, j) F(N, j) 


This problem arises when the Poisson equation is discretized on a rectangle. 
It is easy to show that the matrix A is positive definite. 

With the convention that G(1,7) = 0 whenever i € (0,N + 1} or 
J € (0, M + 1} we see that with overwriting the Gauss-Seidel step takes on 
the form: 


for j =1:M 
for i = 1:N 
G(i, j) = (F(i,j) + GG - 1,3) - Gi - 1,5) 
G(i, j — 1) + G(, j + 1))/4 
end 
end 


Note that in this problem no storage is required for the matrix A. 
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10.1.4 Successive Over-Relaxation 


The Gauss-Seidel iteration is very attractive because of its simplicity. Un- 
fortunately, if the spectral radius of Mg 1 Ng is close to unity, then it may 
be prohibitively slow because the error tends to zero like o(Mc ! Ng)". To 
rectify this, let w € IR and consider the following modification of the Gauss- 
Seidel step: 


for i= 1:n 


i-1 n 

(k+1) (k+1) (k) 

Ti = WwW b; z S aija! = 3 QijTj Qi 
j=l 


j71i41 


i 


+ (1 -wjr (10.1.7) 
end 


This defines the method of successive over-relazation (SOR). Using (10.1.4) 
we see that in matrix terms, the SOR step is given by 


Mrt) = Nox?) + wb (10.1.8) 


where M, = D+wL and Ny = (1—w)D—wU. For a few structured (but 
important) problems such as (10.1.6), the value of the relaxation parameter 
w that minimizes p(M,;! N,) is known. Moreover, a significant reduction 
in (M7 1N;) = p( Ma l Na) can result. In more complicated problems, 
however, it may be necessary to perform a fairly sophisticated eigenvalue 
analysis in order to determine an appropriate w. 


10.1.5 The Chebyshev Semi-Iterative Method 


Another way to accelerate the convergence of an iterative method makes 
use of Chebyshev polynomials. Suppose z(!,..., z(*) have been generated 
via the iteration MzÜ*! = Nz) + b and that we wish to determine 
coefficients v;(k), 7 = 0:k such that 


k 
y 9) = V y (k)zO? (10.1.9) 
j-0 


represents an improvement over r9, If z(Ü — ... = 2) = x, then it is 
reasonable to insist that y) = z. Hence, we require 


k 
Xyk) = 1. (10.1.10) 
j=0 


Subject to this constraint, we consider how to choose the v;(k) so that the 
error in y(& is minimized. 
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Recalling from the proof of Theorem 10.1.1 that 2) —2 = (M—~!N)*e® 
where e( = 7©) — z, we see that 


k k 
yO _y7 = ? vj (k)(z —z) = Su; (k)( MINY eO 
j=0 


j=0 


Working in the 2-norm we therefore obtain 


ly — zll x ll pe(G) lla lle Ile (10.1.11) 


where G = M7!N and 


k 


pr(z) = $ vj(k)z?. 


j=0 


Note that the condition (10.1.10) implies p,(1) = 1. 
At this point we assume that G is symmetric with eigenvalues A; that 
satisfy —1« a <A, < -:: € A1 < B « 1. It follows that 


|px(G)]la = max = |p,(Az)| < m Ipx (A1 - 


" 
1 


Thus, to make the norm of p, (G) small, we need a polynomial p(z} that 
is small on [a, 8| subject to the constraint that p,(1) = 1. 

Consider the Chebyshev polynomials c;(z) generated by the recursion 
cj(z) = 2zcj-1(z) — cj_2(z) where co(z) = 1 and ci(z) = z. These polyno- 
mials satisfy |c;(z)| € 1 on [-1, 1] but grow rapidly off this interval. As a 
consequence, the polynomial 

ues 
a ( 172£—— Ae n) 


p(z) = ex(u) 


where 

— a 1-8 
= 1|+2—— 
p-a p-a 


satisfies p,(1) = 1 and tends to be small on [o, 8]. From the definition of 
Px(z) and equation (10.1.11) we see 


A = -1421 


| z — zO Jj; 


(k) — « 
Y T j2 5 
i j lx qol 


Thus, the larger u is, the greater the acceleration of convergence. 
In order for the above to be a practical acceleration procedure, we need 
a more efficient method for calculating y“) than (10.1.9). We have been 
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tacitly assuming that n is large and thus the retrieval of z( ,..., 2"*) for 
large k would be inconvenient or even impossible. 

Fortunately, it is possible to derive a three-term recurrence among the 
y) by exploiting the threc-term recurrence among the Chebyshev polyno- 
mials. In particular, it can be shown that if 


2—B-—a celp) 
B-a ckilu) 


Wk+1 = 2 


then 


YEE c SE a yo 
Mz) = b- Ay) (10.1.12) 


y -2/2—2a-8) 


where y(0) = x) ang y) = r, We refer to this scheme as the Cheby- 
shev semi-iterative method associated with My t+!) = Ny( + b. For the 
acceleration to be effective we need good lower and upper bounds a and f. 
As in SOR, these parameters may be difficult to ascertain except in a few 
structured problems. 

Chebyshev semi-iterative methods are extensively analyzed in Varga 
(1962, chapter 5), as well as in Golub and Varga (1961). 


10.1.6 Symmetric SOR 


In deriving the Chebyshev acceleration we assumed that the iteration ma- 
trix G = M-!N was symmetric. Thus, our simple analysis does not apply 
to the unsymmetric SOR iteration matrix Af>'N,,. However, it is pos- 
sible to symmetrize the SOR method making it amenable to Chebyshev 
acceleration. The idea is to couple SOR with the backward SOR scheme 


forz—m:-— 1:1 


i-1l n 
k+1 k+1 k 
z! )- w {| b; — ) aija! m ) a 525 fes 
j=l 


j=i+1 


+ (1w (10.1.13) 
end 


This iteration is obtained by updating the unknowns in reverse order in 
(10.1.7). Backward SOR can be described in matrix terms using (10.1.4). 
In particular, we have M,z(**U = N xC) + wb where 


M, =D+wU and N,-(1-w)D —uL. (10.1.14) 
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If A is symmetric (U = LT}, then M, = MT and N, = NT, and we have 
the iteration 


M,z6*2 = N x09 +wb 
(10.1.15) 
MIg*D = NTG409M/2 4 wb. 
It is clear that G = MZ;^NIM;!N, is the iteration matrix for this 
method. From the definitions of M, and N, it follows that 
G = MN = (MD MIY (NI D Na): (10.1.16) 


If D has positive diagonal entries and KKT= (NI D^! N,) is the Cholesky 
factorization, then KTGK 7 = KT(M,D^! MI) !K. Thus, G is similar 
to a symmetric matrix and has real eigenvalues. 

The iteration (10.1.15) is called the symmetric successive over-relazation 
(SSOR) method. It is frequently used in conjunction with the Chebyshev 
semi-iterative acceleration. 


Problems 


P10.1.1 Show that the Jacobi iteration can be written in the form z(** D — gtk) + Hr 
where r(? = b — Az(9?. Repeat for the Gauss-Seidel iteration. 


P10.1.2 Show that if A is strictly diagonally dominant, then the Gauss-Seidel iteration 
converges. 


P10.1.3 Show that the Jacobi iteration converges for 2-by-2 symmetric positive definite 
systems. 


P10.1.4 Show that if A = M — N is singular, then we can never have o(M ^! N) « 1 
even if M is nonsingular. 


P10.1.5 Prove (10.1.16). 


P10.1.6 Prove the converse of Theorem 10.1.1. In other words, show that if the iteration 
Mzx(*+1) —Nz(&) + b always converges, then p(M-1N) <1. 


P10.1.7 (Supplied by R.S. Varga) Suppose that 
E 1 -1/2 H 1 -3/4 
US | -1/2 1 | aee | -1/12 1 |. 


Let Jj and Jy be the associated Jacobi iteration matrices. Show that p(J1) > p(J4) 
thereby refuting the claim that greater diagonal dominance implies more rapid Jacobi 
convergence. 


P10.1.8 The Chebyshev algorithm is defined in terms of parameters 
2c (1/p) 
pck41(1/p) 


where c&(A) = cosh[kcosh^! (4)] with à > 1. (a) Show that 1 < wy < 2 fork > 1 
whenever 0 < p < 1. (b) Verify that wk41 < wk. (c) Determine limu, as k — oo. 


P10.1.9 Consider the 2-by-2 matrix 


WkK+1 = 
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(a) Under what conditions will Gauss-Seidel converge with this matrix? (b) For what 
range of w will the SOR method converge? What is the optimal choice for this parameter? 
(c) Repeat (a) and (b) for the matrix 


= In S 
where S € H?*?, Hint: Use the SVD of S. 


P10.1.10 We want to investigate the solution of Au = f where A # AT. For a model 
problem, consider the finite difference approximation to 


-u"-ow/-0 O0«c«r«l 
where u(0) = 10 and u(1) = 10exp?. This leads to the difference equation 
—uw-it2u —1ui1cR(u,1—uj.1) 20 — i Ln 


where R = ofA/2, ug = 10, and u,41 = 10exp%. The number R should be less than 
1. What is the convergence rate for the iteration Mu(**U = Nul*) + f where M = 
(A + AT)/2 and N = (AT — A)/2? 


P10.1.11 Consider the iteration 


where B has Schur decomposition QT BQ = diag(A1,..., Aa) with `M > --- > An. 
Assume that r = Bz +d. (a) Derive an equation for e(*) = y(*) — x. (b) Assume 
yO = By( +d. Show that e(9 = p,(B)e®) where p, is an even polynomial if k is 
even and an odd polynomial if k is odd. (c) Write f(*) = QTe(X), Derive a difference 
equation for p” for j = 1:n. Try to specify the exact solution for general n and f PP 
(d) Show how to determine an optimal w. 
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10.2 The Conjugate Gradient Method 


A difficulty associated with the SOR, Chebyshev semi-iterative, and related 
methods is that they depend upon parameters that are sometimes hard to 
choose properly. For example, the Chebyshev acceleration scheme needs 
good estimates of the largest and smallest eigenvalue of the underlying 
iteration matrix M-!N. Unless this matrix is sufficiently structured, it 
may be analytically impossible and/or computationally expensive to do 
this. 

In this section, we present a method without this difficulty for the sym- 
metric positive definite Ar = b problem, the well-known Hestenes-Stiefel 
conjugate gradient method. We derived this method in 89.3.1 from the 
Lanczos algorithm. The derivation now is from a different point of view 
and it will set the stage for various important generalizations in $10.3 and 
810.4. 


10.2.1 Steepest Descent 


The starting point in the derivation is to consider how we might go about 
minimizing the function 


é(r) = 52 Az -afb 


where b € IR" and A e IR"*” is assumed to be positive definite and sym- 
metric. The minimum value of ¢(z) is —bT A-1b/ 2, achieved by setting z 
= ÀÁ !b. Thus, minimizing ¢ and solving Az = b are equivalent problems 
if A is symmetric positive definite. 

One of the simplest strategies for minimizing ¢ is the method of steepest 


descent. At a current point z, the function ó decreases most rapidly in the 
direction of the negative gradient: -Vó(z.) = b — Az,. We call 


re = b — Az, 


the residual of ze. If the residual is nonzero, then there exists a positive 
a such that ó(z. + are) < @(z,). In the method of steepest descent (with 
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exact line search) we set œ = rZr./rI Ar, thereby minimizing 


1 
(Te tar.) = $(r.)-— arl Te + 59 Tc rT Are. 
This gives 


Zo = initial guess 
To = b — Arg 
k=0 
while Tk # 0 
k=k+1 (10.2.1) 
Qk = re area / TE Areal 
Tk = Le-) + ALTE-1 
Tk = b — Ax, 
end 


It can be shown that 
1 T 4-1 = 1. 1 T 4—1 
(o) + 7M A i) < ( zu) (oer) + 5° A b) (10.2.2) 


which implies global convergence. Unfortunately, the rate of convergence 
may be prohibitively slow if the condition x2(4) = A (4)/Àn(A) is large. 
Geometrically this means that the level curves of ó are very elongated 
hyperellipsoids and minimization corresponds to finding the lowest point 
in a relatively flat, steep-sided valley. In steepest descent, we are forced 
to traverse back and forth across the valley rather than down the valley. 
Stated another way, the gradient directions that arise during the iteration 
are not different enough. 


10.2.2 General Search Directions 


To avoid the pitfalls of steepest descent, we consider the successive min- 
imization of ¢ along a set of directions {p,,p2,...} that do not neces- 
sarily correspond to the residuals (ro,r;,...) It is easy to show that 
$(Tk-1 + ap.) is minimized by setting 


a = ak = pEre_i/pE Ape. 
With this choice it can be shown that 


l(pirki)? 


10.2.3 
2 pLAD. 


P(Te-1 + Gp) = é(rk-i1)— 


To ensure a reduction in the size of ¢ we insist that py not be orthogonal 
to rk 1. This leads to the following framework: 
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Tọ = initial guess 
To = b ~ Azo 
k=0 
while rẹ Æ 0 
k=k+1 (10.2.4) 
Choose a direction py such that PLTk-1 Æ 0. 
Ok = DE Tk-1/ Pj APE 
Tk = Tk—1 + QkPk 
Tk = b — Az, 
end 


Note that 


ry € ro +span{pi,.-.,Pe} = {Zo + NPI +--: + Ypk: ti € RJ). 


Our goal is to choose the search directions in a way that guarantees con- 
vergence without the shortcomings of steepest descent. 


10.2.3 A-Conjugate Search Directions 
If the search directions are linearly independent and zr, solves the problem 


min  $(z) (10.2.5) 
z€zo-tspanípi,...,px] 


for k = 1,2,..., then convergence is guaranteed in at most n steps. This is 
because zr, minimizes $ over IR" and therefore satisfies Ar, = b. 

However, for this to be a viable approach the search directions must 
have the property that it is “easy” to compute zr, given ry. ;. Let us see 
what this says about the determination of pg. If 


Ze = To + Pk-iy + apy 
where Pk-1 =[pi,.--,Pr-1], v € IR*^!, anda € R, then 


2 
a 
(Tk) = ó(zo + P-iy) + ay? PE Apk + -y Pk AP — api ro. 


If py € span{Apı,... , Apk-1}t, then the cross term oy! PL Ap, is zero 
and the search for the minimizing x, splits into a pair of uncoupled mini- 
mizations, one for y and one for a: 


min (zx) min (Zo + Pk-iy + apr) 
z&€ro-cspan(pi....,px) y. 


2 


: a 
= mn (emo i-a) T 2 Di Apk = apro) 


ya 
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o? 


= min (to + Pi iy) + min ( 
m 2 


Dk Apk x apir o) . 
y 


Note that if y, 1 solves the first min problem then z&-1 = To + Pk-1Yk-1 
minimizes $ over zo +span{pı,. .-,pk—1}. The solution to the œ min prob- 
lem is given by a, = p{ ro/p{ Apk. Note that because of A-conjugacy, 


P&Tk-i = Dy (b— Arp) 
= Pk (b — A(zo + Px-1yx-1)) = pk To- 


With these results it follows that z, = r&.; + agp, and we obtain the 
following instance of (10.2.4): 


zo = initial guess 
k=0 
To = b-— AZo 
while r #0 
k=k+1 
Choose py € span(Api,...,Ápy-i)t so pLry 1 #0. (10.2.6) 
Ok = DLTk-1/ Pi Apk 
Tk = Te-1 + OkDk 
Tk = b— Az 
end 


The following lemma shows that it is possible to find the search directions 
with the required properties. 


Lemma 10.2.1 /fr,  £ 0, then there exists ap, € span{Ap,,..., Apy 17 
such that pl ry , # 0. 


Proof. For the case k = 1, set pj = rg. If k > 1, then since rg-ı Z 0 it 
follows that 


A~*b d zo + span(pi,..., px-i) T b g Axo + span(Api,..., Ápk-i) 
=> ro £span(Api,..., Apx.i1]- 


Thus there exists a p € span{Ap,,..., Ápy 1) ^ such that p^ro # 0. But 
Xy | € To + span(p;,...,pk-1) and so ry 1 € ro + span(Api,..., Apk- il. 
It follows that p^ry.; = pro Z0. O 


The search directions in (10.2.6) are said to be A-conjugate because 
p? Ap; = 0 for all i Z j. Note that if Py = [pi, ..., px] is the matrix of these 
vectors, then 
PP AP, = diag(pl Api,... , pL App) 


is nonsingular since A is positive definite and the search directions are 
nonzero. It follows that P, has full column rank. This guarantees conver- 
gence in (10.2.6) in at most n steps because zr, (if we get that far) minimizes 
$(z) over ran(P,,) = R”. 
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10.2.4 Choosing a Best Search Direction 


A way to combine the positive aspects of steepest descent and A-conjugate 
searching is to choose py in (10.2.6) to be the closest vector to ry; that is 
A-conjugate to p1,...,px-1. This defines “version zero” of the method of 
conjugate gradients: 


zo = initial guess 


k= 0 
To = b — Arg 
while r #0 
k=k+1 
ifk=1 
Pı — To 
else (10.2.7) 
Let p, minimize || p — Tk-1 ||2 over all vectors 
Dt span{ Api, saal Apr—1} x 
end 


Qk = PLTk-1/ PE Apk 
Tk = Le-1 + CkDk 
Tk — b — AT, 

end 

T=, 


To make this an effective sparse Az = b solver, we need an efficient method 
for computing px. A considerable amount of analysis is required to develop 
the final recursions. The first step is to show that py is the minimum 
residual of a certain least squares problem. 


Lemma 10.2.2 For k > 2 the vectors py, generated by (10.2.7) satisfy 
Dk = Tr-1 — APk-12k-1, 
where Py = [py, ..., pk 1] and zy, solves min l| r1 — APR az lla. 
zem! 


Proof. Suppose z,.., solves the above LS problem and let p be the associ- 
ated minimum residual: 


p—Tke]l ~ AP, za. 


It follows that p^ AP,. ; = 0. Moreover, p = |Z — (AP, 1Y(AP& 1)* | rei 
is the orthogonal projection of rj.., into ran( AP. .,)7 and so it is the clos- 
est vector in ran(AP,_1)~ tory j. Thus,p=p,. O 
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With this result we can establish a number of important relationships be- 

tween the residuals ry, the search directions py, and the Krylov subspaces 
K(ro, A, k) = span(ro, Aro, ..., A* ^ !ro). 

Theorem 10.2.3 After k iterations in (10.2.7) we have 


Tk = Tk-1— GV Ápk (10.2.8) 
Pir, = 0 (10.2.9) 
span{p;,...,pe} = span{ro,..-,Tr-1} = K(ro,A,k) (10.2.10) 
and the residuals rg, ...,ry are mutually orthogonal. 


Proof. Equation (10.2.8) follows by applying A to both sides of zy = 
Tk-1 + Opp, and using the definition of the residual. 

To prove (10.2.9), we recall that r = ro + Pyk where yy is the mini- 
mizer of 


6(o + Pay) = (29) + ZU UT AP) - y7 Palb — Azo). 


But this means that y, solves the linear system (PT AP,)y = PI (b — Azo). 
Thus 


0 = PI (b — Azo) — PL AP, = Pi (b — A(zo + Pxyx)) = PI ry. 
To prove (10.2.10) we note from (10.2.8) that 
[Api -.., Apk-1) € span{ro,...,7k-1} 
and so from Lemma 10.2.2, 
Pk = Tk-1 — [Api,---, Apk-i] zx-1 € span{ro,..-,7k-1} 


It follows that 

[D1, Pk] = [ro .., fk&-1] T 
for some upper triangular T. Since the search directions are independent, 
T' is nonsingular. This shows 


span{pı, tee Pr} vo span(ro, SS Tear} 
Using (10.2.8) we see that 
ry € span{rk-1, Apk} € span(ry. 1, Aro, ..-; ATk-1}- 


The Krylov space connection in (10.2.10)follows from this by induction. 

Finally, to establish the mutual orthogonality of the residuals, we note 
from (10.2.9) that rą is orthogonal to any vector in the range of Py. But 
from (10.2.10) this subspace contains ro,...,ry-,. O 


Using these facts we next show that py is a simple linear combination 
of its predecessor p,.., and the “current” residual rj. ,. 
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Corollary 10.2.4 The residuals and search directions in (10.2.7) have the 
property that pk € span(px-i,rx-1) for k > 2. 


Proof. If k = 2, then from (10.2.10) po € span(ro, ri). But pı = ro and 
so pz is a linear combination of pı and r1. 
If k > 2, then partition the vector z,-, of Lemma 10.2.2 as 


z E w | k—-2 


Using the identity rk-1 = Tk-2 — o 1 Ápy- 1, we see that 


Pk = Tk — AP az = Tk-1 — APh-gw — pAPk-ı 


( t E )n-a + $k—1 
Qk-—1 


where 


Tk-2 — ÁAPk.2w 


Sk-1 ps 
k-1 


span(ry-2, AP,_2w} 
span{r;-2, Ap jeneg Apk-2} 
span(nri,...,T&-2) 


IN IA m 


Because the r; are mutually orthogonal, it follows that 54-1 and rj; are 
orthogonal to each other. Thus, the least squares problem of Lemma 10.2.2 
boils down to choosing u and w such that 


2 
u 
Je = (1+) dale ill 
Qk-1 


is minimum. Since the 2-norm of ry..9 — AP. 22 is minimized by zj. 2 giving 
residual pk-1, it follows that s,_1 is a multiple of pg_,. Consequently, 
Pk €span(rk-i,px-i1)]. O 


We are now set to derive a very simple expression for pp. Without loss 
of generality we may assume from Corollary 10.2.4 that 


Pk = Tk-1 + Dkpk-a- 
Since pz ,Apy = 0 it follows that 


PL. 1 Ary} 


fy = —re TE 
PL_yAPK-1 


This leads us to “version 1” of the conjugate gradient method: 
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rg = initial guess 
k=0 
To = b — Arg 
while r Æ 0 
k=k+1 
ifk=1 
pı To 
else 
Bk = —ph_, Ark-a/ pL. ,Apk-a 
Dk = Tk-1 + ÊkPk-1 (10.2.11) 
end 
Qk = DETk-1/Pk Ape 
Tk = Tk-1 + Okpk 
Tk = b — Ar, 
end 
T = Tk 


In this implementation, the method requires three separate matrix-vector 
multiplications per step. However, by computing residuals recursively via 
Tk = Tk-1 — G4 Ápy and substituting 


rE 1Tk-1 = —ay-irL 1 ÁPk-1 (10.2.12) 


and 
TI ofk~2 = Qk- pt1 Ápk-ı (10.2.13) 
into the formula for k, we obtain the following more efficient version: 


Algorithm 10.2.1 [Conjugate Gradients) If A € IR"*" is symmetric 
positive definite, b € IR", and zo € R” is an initial guess (Arg z b), then 
the following algorithm computes z € R” so Ar = b. 


k =0 

To = b — Aro 

while r Æ 0 
k=k+1 


Bk = TL aTk- ATE ork-2 
pk — Tk-1 t Ékpk-i1 
end 
Ok = TL ATk-1/ PL Apk 
Tk = Te-1 + OkDk 
Tk =Tk-1 — Gk ÁDK 
end 
T = Tk 
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This procedure is essentially the form of the conjugate gradient algorithm 
that appears in the original paper by Hestenes and Stiefel (1952). Note 
that only one matrix-vector multiplication is required per iteration. 


10.2.5 The Lanczos Connection 


In $9.3.1 we derived the conjugate gradient method from the Lanczos al- 
gorithm. Now let us look at the connections between these two algorithms 
in the reverse direction by “deriving” the Lanczos process from conjugate 
gradients. Define the matrix of residuals Ry € IR?** by 


Ry = ro... 74-1] 


and the upper bidiagonal matrix B, € IR*** by 


1 — (da 0 e» 0 
0 1 -s : 
By n= E Ea e 0 
: . — Br 
0 c 0 1 


From the equations p; = Ti-1 + D;ipi-i, i = 2:k, and pı = ro it follows that 
Ry = P&By. Since the columns of Py = [pi,..., pk | are A-conjugate, we 
see that REAR, = Bidiag(pl Api,..., pl Apx) Bx is tridiagonal. From 
(10.2.10) it follows that if 


A = diag(po,... 5 pk-1) pi = |i rs lle 


then the columns of R&, A^! form an orthonormal basis for the subspace 
span{ro, Aro,..., A*~!rp}. Consequently, the columns of this matrix are 
essentially the Lanczos vectors of Algorithm 9.3.1, i.e., 


qi = +r-1/pi-1 i= 1:k. 


Moreover, the tridiagonal matrix associated with these Lanczos vectors is 
given by 


Ty = A !Bldiag(p] Ap;)B,A7!. (10.2.14) 


The diagonal and subdiagonal of this matrix involve quantities that are 
readily available during the conjugate gradient iteration. Thus, we can 
obtain good estimates of A's extremal eigenvalues (and condition number) 
as we generate the zy in Algorithm 10.2.1. 
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10.2.6 Some Practical Details 


The termination criteria in Algorithm 10.2.1 is unrealistic. Rounding errors 
lead to a loss of orthogonality among the residuals and finite termination 
is not mathematically guaranteed. Moreover, when the conjugate gradient 
method is applied, n is usually so big that O(n) iterations represents an 
unacceptable amount of work. As a consequence of these observations, it 
is customary to regard the method as a genuinely iterative technique with 
termination based upon an iteration maximum kmaz and the residual norm. 
This leads to the following practical version of Algorithm 10.2.1: 


x = initial guess 


k=0 
r=b— Arg 
po = lir |l 
while ( J/px > elj b |l2) ^ (k € kmaz) 
k=k+1 
ik=] 
p—r 
else (10.2.16) 
Bk = pk-1/pk-2 
p =T + kp 
end 
w = Ap 
O = pk-1/p^ w 
T=T+ app 
T =T — akw 
pk = || r lis 
end 


This algorithm requires one matrix-vector multiplication and 10n flops per 
iteration. Notice that just four n-vectors of storage are essential: x, r, p, 
and w. The subscripting of the scalars is not necessary and is only done 
here to facilitate comparison with Algorithm 10.2.1. 

It is also possible to base the termination criteria on heuristic estimates 
of the error A^ !r, by approximating || A^! ||; with the reciprocal of the 
smallest eigenvalue of the tridiagonal matrix Tz given in (10.2.14). 

The idea of regarding conjugate gradients as an iterative method began 
with Reid (1971). The iterative point of view is useful but then the rate of 
convergence is centra] to the method's success. 


10.2.7 Convergence Properties 


We conclude this section by examining the convergence of the conjugate 
gradient iterates (ry). Two results are given and they both say that the 
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method performs well when A is near the identity either in the sense of a 
low rank perturbation or in the sense of norm. 


Theorem 10.2.5 If A= I +B is an n-by-n symmetric positive definite 
matriz and rank(B) =r then Algorithm 10.2.1 converges in at mostr +1 
steps. 


Proof. The dimension of 
span{ro, Arg,..., A*^!rg] = span{ro, Bro,..., BE- tro} 


cannot exceed r +1. Since p,;,...,py span this subspace and are indepen- 
dent, the iteration cannot progress beyond r + 1 steps. O 


An important metatheorem follows from this result: 


e [f A is close to a rank r correction to the identity, then Algorithm 
10.2.1 almost converges after r 4- 1 steps. 


We show how this heuristic can be exploited in the next section. 
An error bound of a different flavor can be obtained in terms of the 
A-norm which we define as follows: 


lwla = Vor Aw. 


Theorem 10.2.6 Suppose A € R" is symmetric positive definite and 
b c R If Algorithm 10.2.1 produces iterates (xy) and K = &2(.À) then 


k 
V&-1 
Iz- 24 ll, € Alz- ola (YER) 


Proof. See Luenberger (1973, p.187). O 


The accuracy of the {zk} is often much better than this theorem predicts. 
However, a heuristic version of Theorem 10.2.6 turns out to be very useful: 


e The conjugate gradient method converges very fast in the A-norm if 
Ko(A) zx 1. 


In the next section we show how we can sometimes convert a given Az = b 
problem into a related Az = b problem with A being close to the identity. 


Problerns 


P10.2.1 Verify that the residuals in (10.2.1) satisfy riy. = 0 whenever j = t + 1. 
P10.2.2 Verify (10.2.2). 

P10.2.3 Verify (10.2.3). 

P10.2.4 Verify (10.2.12) aud (10.2.13). 


10.2. THE CONJUGATE GRADIENT METHOD 531 


P10.2.5 Give formula for the entries of the tridiagonal matrix Ty in (10.2.14). 


P10.2.6 Compare the work and storage requirements associated with the practical im- 
plementation of Algorithms 9.3.1 and 10.2.1. 

P10.2.7 Show that if A € R?*" is symmetric positive definite and has k distinct eigen- 
values, then the conjugate gradient method does not require more than k + 1 steps to 
converge. 


P10.2.8 Use Theorem 10.2.6 to verify that 


| zx — A715 ||, save (Eo 


k 
yn i) l| zo — A715 ]l;. 


Notes and References for Sec. 10.2 


The conjugate gradient method is a member of a larger class of methods that are referred 
to as conjugate direction algorithms. In a conjugate direction algorithm the search di- 
rections are all B-conjugate for some suitably chosen matrix B. A discussion of these 
methods appears in 


J.E. Dennis Jr. and K. Turner (1987). “Generalized Conjugate Directions,” Lin. Alg. 
and Its Applic. 88/89, 187—209. 

G.W. Stewart (1973). “Conjugate Direction Methods for Solving Systems of Linear 
Equations,” Numer. Math. 21, 284-97. 


Some historical and unifying perspectives are offered in 


G. Golub and D. O’Leary (1989). “Some History of the Conjugate Gradient and Lanczos 
Methods,” SIAM Review 31, 50-102. 

M.R. Hestenes (1990). “Conjugacy and Gradients,” in A History of Scientific Comput- 
ing, Addison-Wesley, Reading, MA. 

S. Ashby, T.A. Manteuffel, and P.E. Saylor (1992). “A Taxonomy for Conjugate Gradient 
Methods,” SIAM J. Numer. Anal. 27, 1542-1568. 


The classic reference for the conjugate gradient method is 

M.R. Hestenes and E. Stiefel (1952). “Methods of Conjugate Gradients for Solving 
Linear Systems,” J. Res. Nat. Bur. Stand. 49, 409-36. 

An exact arithmetic analysis of the method may be found in chapter 2 of 

M.R. Hestenes (1980). Conjugate Direction Methods in Optimization, Springer-Verlag, 
Berlin. 

See also 

O. Axelsson (1977). “Solution of Linear Systems of Equations: Iterative Methods,” in 


Sparse Matriz Techniques: Copenhagen, 1976, ed. V.A. Barker, Springer-Verlag, 
Berlin. 


For a discussion of conjugate gradient convergence behavior, see 


D. G. Luenberger (1973). Introduction to Linear and Nonlinear Programming, Addison- 
Wesley, New York. 

A. van der Sluis and H.A. Van Der Vorst (1986). “The Rate of Convergence of Conjugate 
Gradients,” Numer. Math. 48, 543-560. 
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The idea of using the conjugate gradient method as an iterative method was first dis- 
cussed in 


J.K. Reid (1971). “ On the Method of Conjugate Gradients for the Solution of Large 
Sparse Systems of Linear Equations,” in Large Sparse Sets of Linear Equations , ed. 
J.K. Reid, Academic Press, New York, pp. 231-54. 


Several authors have attempted to explain the algorithm’s behavior in finite precision 
arithmetic. See 


H. Wozniakowski (1980). “Roundoff Error Analysis of a New Class of Conjugate Gradient 
Algorithms,” Lin. Alg. and Its Applic. 29, 

A. Greenbaum and Z. Strakos (1992). “Predicting the Behavior of Finite Precision 
lanczos and Conjugate Gradient Computations,” SIAM J. Matriz Ana. Applic. 13, 
121-137. 


See also the analysis in 


G.W. Stewart (1975). “The Convergence of the Method of Conjugate Gradients at 
Isolated Extreme Points in the Spectrum,” Numer. Math. 24, 85-93. 

A. Jennings (1977). “Influence of the Eigenvalue Spectrum on the Convergence Rate of 
the Conjugate Gradient Method," J. Inst. Math. Applic. 20, 61-72. 

J. Cullum and R. Willoughby (1980). “The Lanczos Phenomena: An Interpretation 
Based on Conjugate Gradient Optimization,” Lin. Alg. and Its Applic. 29, 63-90. 


Finally, we mention that the method can be used to compute an eigenvector of a large 
sparse symmetric matrix: 


A. Ruhe and T. Wiberg (1972). “The Method of Conjugate Gradients Used in Inverse 
Iteration,” BIT 12, 543-54. 


10.3 Preconditioned Conjugate Gradients 


We concluded the previous section by observing that the method of con- 
jugate gradients works well on matrices that are either well conditioned or 
have just a few distinct eigenvalues. (The latter being the case when A is 
a lower rank perturbation of the identity.) In this section we show how to 
precondition a linear system so that the matrix of coefficients assumes one 
of these nice forms. Our treatment is quite brief and informal. Golub and 
Meurant (1983) and Axelsson (1985) have more comprehensive expositions. 


10.3.1 Derivation 


Consider the n-by-n symmetric positive definite linear system Ax = b. The 
idea behind preconditioned conjugate gradients is to apply the “regular” 
conjugate gradient method to the transformed system 


Az = b, (10.3.1) 


where A = C-! AC-!, z = Cz, b = C-!b, and C is symmetric positive 
definite. In view of our remarks in §10.2.8, we should try to choose C 
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so that A is well conditioned or a matrix with clustered eigenvalues. For 
reasons that will soon emerge, the matrix C? must also be "simple." 


If we apply Algorithm 10.2.1 to (10.3.1), then we obtain the iteration 


k=0 

Zo = initial guess (AZ ~ b) 
To = b = Ao 

while Tk T U 


k=k+1 
ifk=1 
pi fo 
else (10.3.2) 


gk = Pakea ka 
Dk = Tk-1 + ÉÜkDk-i1 
end 
ak TE Feci [BEC ACH 
Ik = Īk-1 + OkDk 
Fk = fy; — a.C 1 ACC By 


Here, zi should be regarded as an approximation to Z and f% is the residual 
in the transformed coordinates, i.e., 7, = b— Ay. Of course, once we have Z 
then we can obtain z via the equation x = C~'Z. However, it is possible to 
avoid explicit reference to the matrix C^! by defining py = Cpy, Ëk = Cz, 
and f, = C^ !r,. Indeed, if we substitute these definitions into (10.3.2) and 
recall that b = C-'b and Žž = Cz, then we obtain 


k=0 
rg = initial guess (Azo =% b) 
To = b — AZo 
while C^!r, £0 

k=k+1 

ifk=1 

Cpi = C-tro 
else (10.3.3) 


end 


Bk = (C^ Tk D (Co rj) (Co rea)" (Crea) 
Co, = C7 rp + BeCpe-1 
end 
ak = (Co rg_1)7 (C^! rk-1)/(Cpk)T (C^ AC) (Cpr) 
Cry = Crp.) + oC pe 
C7'!r, = digi gr — ax(C- LAC Cp, 


Cr = Cr, 
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If we define the preconditioner M by M = C? (also positive definite) and 
let z be the solution of the system Mz, = ry then (10.3.3) simplifies to 


Algorithm 10.3.1 [Preconditioned Conjugate Gradients] Given a 
symmetric positive definite A c IR"*", b c IR”, a symmetric positive def- 
inite preconditioner M, and an initial guess zo (Azo = b), the following 
algorithm solves the linear system Az = b. 


k=0 
To = b— Axo 
while (Tk x 0) 
Solve Mz, = ry. 


k=k+1 
ifk=1 

pi = 20 
else 


Bk = rL k-i Tiaka 
Dk = Zk—1 + PDkPk-1 
end 
Ok = Tg AZk-1/Pk APE 
Tk = Tk-1 + OkDk 
Tk = Tk-1 — GAP, 
end 
£ = Tk 


A number of important observations should be made about this procedure: 


e It can be shown that the residuals and search directions satisfy 
Tapin _ WW 
rM r0 ifj (10.3.4) 
T(CAc)-0 iF 5 10.3.5 
pj ( )p; ij (10.3.5) 


e The denominators Ey eke = zi 4M Zy-2 never vanish because M 
is positive definite. 


e Although the transformation C figured heavily in the derivation of the 
algorithm, its action is only felt through the preconditioner M — C?. 


e For Algorithm 10.3.1 to be an effective sparse matrix technique, linear 


systems of the form Mz = r must be easily solved and convergence 
must be rapid. 


The choice of a good preconditioner can have a dramatic effect upon the 
rate of convergence. Some of the possibilities are now discussed. 
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10.3.2 Incomplete Cholesky Preconditioners 


One of the most important preconditioning strategies involves computing an 
incomplete Cholesky factorization of A. The idea behind this approach is 
to calculate a lower triangular matrix H with the property that H has some 
tractable sparsity structure and is somehow “close” to A’s exact Cholesky 
factor G. The preconditioner is then taken to be M = H HT. To appreciate 
this choice consider the following facts: 


e There exists a unique symmetric positive definite matrix C such that 
M sU. 
e There exists an orthogonal Q such that C = QHT, i.e., HT is the 
upper triangular factor of a QR factorization of C. 
We therefore obtain the heuristic 
A = CAO 2-0 ac (10.3.6) 
(HQT)!A(QHT)! = Q(H-'GGTH-T)QT x I 
Thus, the better H approximates G the smaller the condition of A, and the 
better the performance of Algorithm 10.3.1. 
An easy but effective way to determine such a simple H that approxi- 
mates G is to step through the Cholesky reduction setting hi; to zero if the 


corresponding a;; is zero. Pursuing this with the outer product version of 
Cholesky we obtain 


for k =1:n 
A(k, k) = y Atk, k) 
for i = k + l:in 


if A(i, k) Z0 
end 
end (10.3.7) 
for 7=k+1:n 
for i = 7n 
if A(i,7) Z0 
end 
end 
end 
end 


In practice, the matrix A and its incomplete Cholesky factor H would 
be stored in an appropriate data structure and the looping in the above 
algorithm would take on a very special appearance. 

Unfortunately, (10.3.7) is not always stable. Classes of positive definite 
matrices for which incomplete Cholesky is stable are identified in Manteuffel 
{1979). See also Elman (1986). 
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10.3.3 Incomplete Block Preconditioners 


As with just about everything else in this book, the incomplete factoriza- 
tion ideas outlined in the previous subsection have a block analog. We 
illustrate this by looking at the incomplete block Cholesky factorization of 
the symmetric, positive definite, block tridiagonal matrix 


A, ET 0 
A= |E, A ET 
0 E Ag 


For purposes of illustration, we assume that the A, are tridiagonal and the 
E; are diagonal. Matrices with this structure arise from the standard 5- 
point discretization of self-adjoint elliptic partial differential equations over 
a two-dimensional domain. 

The 3-by-3 case is sufficiently general. Our discussion is based upon 
Concus, Golub, and Meurant (1985). Let 


G, 0 0 
G = F G9 0 
0 F Gs 


be the exact block Cholesky factor of A. Although G is sparse as a block 
matrix, the individual blocks are dense with the exception of G4. This can 
be seen from the required computations: 


GGT = B, =A, 


F = EG 
GG} = By = A.-F, FP = A- EB ET 
Fy = E4Gj! 


GG? = B4 = A- RFI 


A3 — E3B;! Es 


We therefore seek an approximate block Cholesky factor of the form 


i Ĝ 0 0 
G = Fi G2 0 
0 F5 Ga 


so that we can easily solve systems that involve the preconditioner M — 
GGT, 'This involves the imposition of sparsity on G's blocks and here is 


a reasonable approach given that the A; are tridiagonal and the Ej are 
diagonal: 
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GG? = B, = A) 
Fi, = EG 

GC = By = A — Fi A, EF, A, (tridiagonal) z By! 
FP, = EG 

G3Gl = Bz = As- EMET, Ap (tridiagonal) ~ By! 


Note that all the B; are tridiagonal. Clearly, the A; must be carefully 
chosen to ensure that the B, are also symmetric and positive definite. It 
then follows that the G; are lower bidiagonal. The F; are full, but they 
need not be explicitly formed. For example, in the course of solving the 


system Mz — r we must solve a system of the form 


G; 0 0 Wy Ti 
Fi Go 0 wa I T2 
0 F5 Ga w3 T3 


Forward elimination can be used to carry out matrix-vector products that 
involve the F; — E;G; i: 


Giu = Ty 
Gow) = Tg — Fu — r2 — EG wi 
G3w3 = T3 — Fow? = f3 — E,Gz'we 


The choice of A; is delicate as the resulting B; must be positive definite. 
As we have organized the computation, the central issue is how to approx- 
imate the inverse of an m-by-m symmetric, positive definite, tridiagonal 
matrix T = (t;;) with a symmetric tridiagonal matrix A. There are several 
reasonable approaches: 


e Set A = diag(1/ti1,...,1/tnn)- 


e Take A to be the tridiagonal part of T-t. This can be efficiently 
computed since there exist u,v € IR" such that the lower triangular 
part of T-t is the lower triangular part of uv?. See Asplund(1959). 


e Set A = UTU where U is the lower bidiagonal portion of G^! where 
T = GGT is the Cholesky factorization. This can be found in O(m) 
flops. 


For a discussion of these approximations and what they imply about the 
associated preconditioners, see Concus, Golub, and Meurant (1985). 
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10.3.4 Domain Decomposition Ideas 


The numerical solution of elliptic partial differential equations often leads 
to linear systems of the form 


Ay. a: wee BS ry d 

> Ap B» T2 dy 
= : (10.3.8) 

Ap Bp Tp dy 

Bl Bi .. BT Q z f 


if the unknowns are properly sequenced. See Meurant (1984). Here, the 
A; are symmetric positive definite, the B; are sparse, and the last block 
column is generally much narrower than the others. 

An example with p = 2 serves to connect (10.3.8) and its block structure 
with the underlying problem geometry and the chosen domain decomposi- 
tion. Suppose we are to solve Poisson's equation on the following domain: 


+ 
+ 
+ 
+ 
+ 
+ 
+ 
* 
x 
X 
x 
x 
x 


X xx xx oe a ee oe 
X * x ox x et ttt 

X X xxx ett 
xx K KK Ht L4 c A x-——— 
Mo» ox wox ET cb GE 
xa RK X OX Ht ttt t+ 


With the usual discretization, an unknown at a mesh point is coupled only 
to its “north”, “east”, “south”, and “west” neighbor. There are three 
“types” of variables: those interior to the top subdomain (aggregated in 
the subvector xı and associated with the “+” mesh points), those interior 
to the bottom subdomain (aggregated in the subvector zz and associated 
with the “x” mesh points), and those on the interface between the two 
subdomains (aggregated in the subvector z and associated with the “+” 
mesh points). Note that the interior unknowns of one subdomain are not 
coupled to the interior unknowns of another subdomain, which accounts 
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for the zero blocks in (10.3.8). Also observe that the number of interface 
unknowns is typically small compared to the overall number of unknowns. 

Now let us explore the preconditioning possibilities associated with 
(10.3.8). We continue with the p = 2 case for simplicity. If we set 


Mj! 0 0 


M = L 0 My? oj] 
D 0 s^! 
where 
Mi 0 0 
Ls 0 M4 0 
BI Bi S 
then 
Mi 0 Bi 
M = 0 M Be (10.3.9) 
BI BF S. 


with S, = S + BTM,!B, + BT M; Bs. Let us consider how we might 
choose the block parameters Mı, M», and S so as to produce an effective 
preconditioner. 

If we compare (10.3.9) with the p — 2 version of (10.3.8) we see that it 
makes sense for M; to approximate A; and for S, to approximate Q. The 
latter is achieved if S ~ Q — BT Mj B, — BT M3 ' B5. There are several 
approaches to selecting S and they all address the fact that we cannot form 
the dense matrices B; M; ! BT. For example, as discussed in the previous 
subsection, tridiagonal approximations of the M ut could be used. See 
Meurant (1989). 

If the subdomains are sufficiently regular and it is feasible to solve linear 
systems that involve the A; exactly (say by using a fast Poisson solver), then 
we can set M; = Ai. It follows that M = A+ E where the rank(E) = m 
with m being the number of interface unknowns. Thus, the preconditioned 
conjugate gradient algorithm would theoretically converge in m + 1 steps. 

Regardless of the approximations that must be incorporated in the pro- 
cess, we see that there are significant opportunities for parallelism because 
the subdomain problems are decoupled. Indeed, the number of subdomains 
p is usually a function of both the problem geometry and the number of 
processors that are available for the computation. 


10.3.5 Polynomial Preconditioners 


The vector z defined by the preconditioner system Mz = r should be 
thought of as an approximate solution to Az = r insofar as M is an ap- 
proximation of A. One way io obtain such an approximate solution is to 
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apply p steps of a stationary method M,z(*t) = N,2(09 +r, z% =0. It 
follows that if G = My[!N then 


z m 2) — (IG. GP Mp. 


Thus, if M^! = (I + G+---G?-!)M7' then Mz = r and we can think 
of M as a preconditioner. Of course, it is important that M be symmetric 
positive definite and this constrains the choice of Mj, N,, and p. Because 
M is a polynomial in G it is referred to as a polynomial preconditioner. 
This type of preconditioner is attractive from the vector/parallel point of 
view and has therefore attracted considerable attention. 


10.3.6 Another Perspective 


The polynomial preconditioner discussion points to an important connec- 
tion between the classical iterations and the preconditioned conjugate gra- 
dient algorithm. Many iterative methods have as their basic step 


Tk = Tk-2 +WklYkZk-1 + Xk-i — Zk-2) (10.3.10) 


where Mzk-1 = Tk-1 = b-— Az,_). For example, if we set wą = 1, and 
^k = 1, then 
Tk = M~*(b- Azrk-i) + Xk-i 


ie, Mz, = Nzk-1 +b, where A = M - N. Thus, the Jacobi, Gauss- 
Seidel, SOR, and SSOR. methods of 810.1 have the form (10.3.10). So also 
does the Chebyshev semi-iterative method (10.1.12). 

Following Concus, Golub, and O'Leary (1976), it is also possible to 
organize Algorithm 10.3.1 with a central step of the form (10.3.10): 


T-1 =0; rg = initial guess; k = 0; ro = b — Axa 
while r;. Z 0 

k=k+1 

Solve M zy, = rk-1 for Zk_1. 


"Jk-1 = zh Mzy af GAZ 4 
ifk=1 (10.3.11) 
Ww = 1 
else 
T -1 
wk = (: _ Jk-1 Zk- ıMzk-ı 1 ) 


=) a y 
Yk-2 Zk- 2M zk-2 Wk] 


end 
Ik = Ik-2 + Wk(Yk-12k-1 + Xk-i — Xk-2) 
Tk = b -— Ax: 


end 
"mq 
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Thus, we can think of the scalars wy and +, in (10.3.11) as acceleration 
parameters that can be chosen to speed the convergence of the iteration 
Ma, = Nay, t b. Hence, any iterative method based on the splitting 
A = M -N can be accelerated by the conjugate gradient algorithm as long 
as M (the preconditioner) is symmetric and positive definite. 


Problems 


P10.3.1 Detail an incomplete factorization procedure that is based on gaxpy Cholesky, 
i.e., Algorithm 4.2.1. 


P10.3.2 How many n-vectors of storage is required by a practical implementation of 
Algorithm 10.3.1? Ignore workspaces that may be required when Mz — r is solved. 
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10.4 Other Krylov Subspace Methods 


The conjugate gradient method presented over the previous two sections 
is applicable to symmetric positive definite systems. The MINRES and 
SYMMLQ variants developed in 89.3.2 in connection with the symmetric 
Lanczos process can handle symmetric indefinite systems. Now we push 
the generalizations even further in pursuit of iterative methods that are 
applicable to unsymmetric systems. 

The discussion is patterned after the survey article by Freund, Golub, 
and Nachtigal (1992) and Chapter 9 of Golub and Ortega (1993). We focus 
on cg-type algorithms that involve optimization over Krylov spaces. 
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Bear in mind that there is a large gap between our algorithmic speci- 
fications and production software. A good place to build an appreciation 
for this point is the Templates book by Barrett et al (1993). The book by 
Saad (1996) is also highly recommended 


10.4.1 Normal Equation Approaches 


The method of normal equations for the least squares problem is appealing 
because it allows us to use simple “Cholesky technology” instead of more 
complicated methods that involve orthogonalization. Likewise, in the un- 
symmetric Az = b problem it is tempting to solve the equivalent symmetric 
positive definite system 

AT Ax = AT b 


using existing conjugate gradient technology. Indeed, if we make the sub- 
\stitution A —— AT A in Algorithm 10.2.1 and note that a normal equation 
residual ATb — AT Az, is AT times the “true” residual b — Arg, then we 
obtain the Conjugate Gradient Normal Equation Residual method: 


Algorithm 10.4.1 [CGNR] If A € IR"*" is nonsingular, b € IR^, and 
zo € IR” is an initial guess (Arp z 5), then the following algorithm com- 
putes x € IR" so Ar = b. 


k=0 
To = b — Arg 
while r 40 
k=k+1 
ifk=1 
= ATTo 
else 
Pk = (ATr, 1)? (ATrk-1)/(ATrk-2)" (ATrk-2) 
Dk = AT rp. + ÉkPk-1 
end 


o = (ATr4 1)? (ATry 1)/ (Apk)T (Apr) 
Tk = Tk-1 t OkDk 
Tk = Tk-1 — OAD, 

end 

r= Lb 


Another way to make an unsymmetric Az = b problem “cg-friendly” is to 
work with the system 


AATy=b  z-—ATy 


In “y space” the cg algorithm takes on the following form: 
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k —0 
yo = initial guess (AAT yo = b) 
TQ — b — AAT yo 
while rą 4 0 

k=k+1 

ifk=1 

Pi = fo 
else 


Bx = VI ge La 
Pk — Tk-1 + BkPk-1 
end 
ak = TL írk-i/PL AAT pk 
Uk = Uk-1 t GkDk 
Tk — Tk—] — ay AAT py 
end 
Y = Uk 


Making the substitutions ry — ATy, and p, — A^ py and simplifying we 
obtain the Conjugate Gradient Normal Equation Error method: 


Algorithm 10.4.2 [CGNE] If A € IR^** is nonsingular, b € IR", and 
ro € IR” is an initial guess (Arg ~ b), then the following algorithm com- 
putes r € IR" so Ar = b. 


k —0 
ro = b — Aro 
while r # 0 
k=k+1 
fk = 1 
pi = A’ ro 
else 


Bk = TL ark-i/TEL oh k-2 
pk = AT ry i + Bxpka 
end 
Qk = Tp: afe DEDE 
Tk = Xk-|] + OKDK 
Tk — Tk.1 — GkÁDK 
end 
I = Tk 


In general these two normal equation approaches are handicapped by the 
squaring of the condition number. (Recall Theorem 10.2.6.) However, 
there are some occasions where they are effective and we refer the reader 
to Freund, Golub and Nachtigal (1991). 
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10.4.2 A Note on Objective Functions 


Based on what we know about the cg method, the CGNR iterate rz, mini- 
mizes 


a(x) = 527 (A7 A)a — zT AT 


over the set 
SICENP) Z vo + K(AT A, ro, K). 


It is easy to show that 


1 1 
zib- Az lin = éi(z) + 557b 


and so x, minimizes the residual || 6 — Az l|; over aaa The “R” in 


“CGNR” is there because of the residual-based optimization. 
On the other hand, the CGNE (implicit) iterate y, minimizes 


1 
doly) = 5v (AAT )y - v^ 


over the set yo + K(AAT,b — AAT yp, k). With the change of variable z = 
AT y it can be shown that ry minimizes 


1 
site aT Ab |a = AMID + SI] AOI 


over 
SCONE) Z ro +K(ATA, AT To, k). (10.4.1) 


Thus CGNE minimizes the error at each step and that explains the “E” in 
“CGNE”. 


10.4.8 The Conjugate Residual Method 


Recall that if A is symmetric positive definite, then it has a symmetric 
positive definite square root A!/?. (See §4.2.10.) Note that in this case 
Ar = b and 

AVAg = A> 


are equivalent and that the former is the normal equation version of the 
latter. If we apply CGNR to this square root system and simplify the 
results, then we obtain 


Algorithm 10.2.3 [Conjugate Residuals] If A € IR**" is symmetric 
positive definite, b € IR^, and zo € IR" is an initial guess (Aro z b), then 
the following algorithm computes x € IR" so Az = b. 
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k=0 
ro = b — Axg 
while r 4 0 


k=k+1 
ifk=1 

Pı — TQ 
else 


Bk = rÍ Ark—1/T pg AT R-2 
Ap, = Arg—1 + Dk Ápx-a1 
end 
Oe = TE ArK—1/ (Ape)? (Apr) 
Ik = Ik-] + OkKDk 
Tk = Tk-1 ~ OkÁpK 
end 
T = Tk 


It follows from our comments about CGNR that || A-!/?(b — Az) ||; is min- 
imized over the set zo + K(A,ro, k) during the kth iteration 


10.4.4 GMRES 


In 89.3.2 we briefly discussed the Lanczos-based MINRES method for sym- 
metric, possibly indefinite, Az = b problems. In that method the iterate 
z, minimizes || b — Az ||, over the set 


Sk = Zo t spanira, Arg, | A* Ing) = To + K(A, ro, k) (10.4.2) 


The key idea behind the algorithm is to express zx in terms of the Lanczos 
vectors Q1, Q2, ..., qx which span K(A,ro, k) if qi is a multiple of the initial 
residual rg = b — Azo. 

In the Generalized Minimum Residual (GMRES) method of Saad and 
Schultz (1986) the same approach is taken except that the iterates are 
expressed in terms of Arnoldi vectors instead of Lanczos vectors in order 
to handle unsymmetric A. After k steps of the Arnoldi iteration (9.4.1) we 
have the factorization . 

AQk = Qk4i Hk (10.4.3) 


where the columns of Qx41 = | Qk qk41] are the orthonormal Arnoldi vec- 
tors and 


his Tee xem hu 
ER MN dees hok 
0 hk,k—-1 hkk 
0 0 hk+1,k 
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is upper Hessenberg. In the kth step of GMRES, || b — Azz ||; is minimized 
subject to the constraint that z has the form zę = To + Qkyk for some 
yx € RŽ. If qı = ro/po where po = || ro lla, then it follows that 


| b — Alzo + Qeye) lo = ilro- AQxy« Ilo 
= || ro — Q+ Hry ll; 
= || poer — Hxyx lla. 


Thus, yx is the solution to a (k + 1)-by-k least squares problem and the 
GMRES iterate is given by zy = zo + QkYk - 


Algorithm 10.4.4 [GMRES] If A € IR*"" is nonsingular, b € IR’, and 
Zo € IR" is an initial guess (Arp = b), then the following algorithm com- 
putes x € R” so Az = b. 


TO = b — Azo 
hio = | ro |l; 


while (hk+1,k > 0) 
Qqk1 = TkfBk41.k 


k=k+1 
Tk = ÁQk 
for : = 1:k 
hik = qi ry 
Tk = Tk — hind 
end 
hk = || rx lla " 
Tk = To + QkYk where l h1o€1 —= HkYk lle = min 
end 
T = Lk 


It is casy to verify that 


|| b — Azx llo = Artis 


The upper Hessenberg least square problem can be efficiently solved using 
Givens rotations. In practice there is no need to form z, until one is happy 
with its residual. 

The main problem with “unlimited GMRES” is that the kth iteration 
involves O(kn) flops. Thus like Arnoldi, a practical GMRES implementa- 
tion requires a restart strategy to avoid excessive amounts of computation 
and memory traffic. For example, if at most m steps are tolerable, then £m 
can be used as the initial vector for the next GMRES sequence. 
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10.4.5 Preconditioning 


Preconditioning is the other key to making GMRES effective. Analogous 
to the development of the preconditioned conjugate gradient method in 
810.3, we obtain a nonsingular matrix M = MM» that approximates A 
in some sense and then apply GMRES to the system Az = b where A = 
M,'AM;z", b = Mj b, and $ = Moz. If we write down the GMRES 
iteration for the tilde system and manipulate the equations to restore the 
original variables, then the resulting iteration requires the solution of linear 
systems that involve the preconditioner M. Thus, the act of finding a good 
preconditioner M = M,Mg is the act of making A = Mj !AM;! look 
as much as possible like the identity subject to the constraint that linear 
systems with M are easy to solve. 


10.4.6 The Biconjugate Gradient Method 


Just as Arnoldi underwrites GMRES, the unsymmetric Lanczos process 
underwrites the Biconjugate gradient (BiCG) method. The starting point 
in the development of BiCG is to go back to the Lanczos derivation of the 
conjugate gradient method in 89.3.1. In terms of Lanczos vectors, the kth 
cg iterate is given by Tk = To + Qkyk where Qk is the matrix of Lanczos 
vectors, Tj, = QT AQ, is tridiagonal, and yy solves Ty, = Qi ro. Note that 


Qi (b — Atk) = Qi (ro — AQxyx) = 0. 


Thus, we can characterize ry by insisting that it come from zo + K(A, ro, k) 
and that it produce a residual that is orthogonal to a given subspace, say 
K(A, TQ, k). 

In the unsymmetric case we can extend this notion by producing a se- 
quence of iterates (r4) with the property that zx belongs to zo - K(A, ro, k) 
and produces a residual that is orthogonal to K( AT, so, k) for some so € IR". 
Simplifications occur if the unsymmetric Lanczos process is used to gener- 
ate bases for the two involved Krylov spaces. In particular, after k steps 
of the unsymmetric Lanczos algorithm (9.4.7) we have Qk, Pk € IR"** such 
that PT Qk = Ik and a tridiagonal matrix Tk = ad AQ, such that 


AQk QI, + The, PIr, = 0 
AT P, = BT: + skel QT sk =0 


(10.4.4) 


In BiCG we set Tk = ro +Qkyk where Tkyk = QT ro. Note that the Galerkin 
condition 


Pi (b J» Az,) = PI (ro = AQkYk) =0 
holds. 


As might be expected, it is possible to develop recursions so that £k 
can be computed as a simple combination of z&..; and qj. 1, instead of as 
a linear combination of all the previous q-vectors. 
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The BiCG method is subject to serious breakdown because of its de- 
pendence on the unsymmetric Lanczos process. However, by relying on 
a look-ahead Lanczos procedure it is possible to overcome some of these 
difficulties. 


10.4.7 QMR 


Another iteration that runs off of the unsymmetric Lanczos process is the 
quasi-minimum residual (QMR) method of Freund and Nachtigal (1991). 
As in BiCG the kth iterate has the form rj = ro +Qkyk. It is easy to show 
that after k steps in (9.4.7) we have the factorization 


AQ, = Qai Tk 
where 7, € IR**!** is tridiagonal. It follows that if g; = p(b — Azo), then 
b- Azk = b- A(zo- Qkyx) 
= 1 - ÁQkyk 
= ro-QuaTky. 
= Qk+i(per — Th yk). 
If yx is chosen to minimize the 2-norm of this vector, then in exact arith- 


metic zo + Qxyx defines the GMRES iterate. In QMR, y, is chosen to 
minimize || pe1 — Thy ll;. 


10.4.8 Summary 


The methods that we have presented do not submit to a linear ranking. 
The choice of a technique is complicated and depends on a host of factors. 
À particularly cogent assessment of the major algorithms is given in Barrett 
et al (1993). 


Problems 


P10.4.1 Analogous to (10.2.16), develop efficient implementations of the CGNR, CGNE, 
Conjugate residual methods. 


P10.4.2 Establish the mathematical equivalence of the CGNR and the LSQR method 
outlined in 89.3.4. 


P10.4.3 Prove (10.4.3). 


P10.4.4 Develop an efficient preconditioned GMRES implementation. Proceeding as 
we did in 810.3 for preconditioned conjugate gradient method. (See (10.3.2) and (10.3.3) 
in particular.) 

P10.4.5 Prove that the GMRES least squares problem has full rank. 


Notes and References for Sec. 10.4 


The following papers serve as excellent introductions to the world of unsymmetric iter- 
ation: 
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S. Eisenstat, H. Elman, and M. Schultz (1983). “Variational Iterative Methods for 
Nonsymmetric Systems of Equations,” SIAM J. Num. Anal. 20, 345-357. 

R.W. Freund, G.H. Golub, and N. Nachtigal (1992). “Iterative Solution of Linear Sys- 
tems," Acta Numerica 1, 57-100. 

N. Nachtigal, S. Reddy, and L. Trefethen (1992). *How Fast Are Nonsymmetric Matrix 
Iterations,” SIAM J. Matriz Anal. Appl. 13, 778—195. 

A. Greenbaum and L.N. Trefethen (1994). “GMRES/CR and Arnoldi/Lanczos as Matrix 
Approximation Problems," SIAM J. Sci. Comp. 15, 359-368. 


Krylov space methods and analysis are featured in the following papers: 


W.E. Arnoldi (1951). “The Principle of Minimized Iterations in the Solution of the 
Matrix Eigenvalue Problem,” Quart. Appl. Math. 9, 17-29. 

Y. Saad (1981). “Krylov Subspace Methods for Solving Large Unsymmetric Linear 
Systems," Math. Comp. 37, 105-126. 
Y. Saad (1984). “Practical Use of Some Krylov Subspace Methods for Solving Indefinite 
&nd Nonsymmetric Linear Systems," SIAM J. Sci. and Stat. Comp. 5, 203-228. 
Y. Saad (1989). “Krylov Subspace Methods on Supercomputers,” SIAM J. Sci. and 
5tat. Comp. 10, 1200-1322. 

C.-M. Huang and D.P. O'Leary (1993). “A Krylov Multisplitting Algorithm for Solving 
Linear Systems of Equations," Lin. Alg. and Its Applic. 194, 9-29. 

C.C. Paige, B.N. Parlett,and H.A. Van Der Vorst (1995). “Approximate Solutions and 
Eigenvalue Bounds from Krylov Subspaces," Numer. Linear Algebra with Applic. 2, 
115-134. 


References for the GMRES method include 


Y. Saad and M. Schultz (1986). *GMRES: A Generalized Minimal Residual Algorithm 
for Solving Nonsymmetric Linear Systems,” SIAM J. Scientific and Stat. Comp. 7, 
856-869. 

H.F. Walker (1988). “Implementation of the GMRES Method Using Householder Trans- 
formations,” SIAM J. Sci. Stat. Comp. 9, 152-163. 

C. Vuik and H.A. van der Vorst (1992). “A Comparison of Some GMRES-like Methods,” 
Lin. Alg. and Its Applic. 160, 131-162. 

N. Nachtigal, L. Reichel, and L. Trefethen (1992). “A Hybrid GMRES Algorithm for 
Nonsymmetric Linear Systems,” SIAM J. Matriz Anal. Appl. 13, 796-825. 

Y. Saad (1993). “A Flexible Inner-Outer Preconditioned GMRES Algorithm,” SIAM J. 
Sci. Comput. 14, 461-469. 

Z. Bai, D. Hu, and L. Reichel (1994). “A Newton Basis GMRES Implementation,” IMA 
J. Num. Anal. 14, 563-581. 

R.B. Morgan (1995). "A Restarted GMRES Method Augmented with Eigenvectors," 
SIAM J. Matriz Anal, Applic. 16, 1154-1171. 


Preconditioning ideas for unsymmetric problems are discussed in the following papers: 


Y. Saad (1988). “Preconditioning Techniques for Indefinite and Nonsymmetric Linear 
Systems," J. Comput. Appl. Math. 24, 89-105. 

L. Yu. Kolotilina and A. Yu. Yeremin (1993). “Factorized Sparse Approximate Inverse 
Preconditioning I: Theory,” SIAM J. Matriz Anal. Applic. 14, 45-58. 

LE. Kaporin (1994). “New Convergence Results and Preconditioning Strategies for the 
Conjugate Gradient Method,” Num. Lin. Alg. Applic. 1, 179-210. 

L. Yu. Kolotilina and A. Yu. Yeremin (1995). “Factorized Sparse Approximate Inverse 
Preconditioning II: Solution of 3D FE Systems on Massively Parallel Computers," 
Intern. J. High Speed Comput. 7, 191-215. 

H. Elman (1996). *Fast Nonsymmetric Iterations and Preconditioning for Navier-Stokes 
Equations,” SIAM J. Sci. Comput. 17, 33-46. 
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M. Benzi, C.D, Meyer, and M. Tuma (1996). “A Sparse Approximate Inverse Precondi- 
tioner for the Conjugate Gradient Method,” SIAM J. Sci. Comput. 17, to appear. 


Some representative papers concerned with the development of nonsymmetric conjugate 
gradient procedures include 


D.M. Young and K.C. Jea (1980). “Generalized Conjugate Gradient Acceleration of 
Nonsymmetrizable Iterative Methods,” Lin. Alig. and Its Applic. 34, 159-94. 

O. Axelsson (1980). “Conjugate Gradient Type Methods for Unsymmetric and Incon- 
sistent Systems of Linear Equations,” Lin. Alg. and Its Applic. 29, 1-16. 

K.C. Jea and D.M. Young (1983). “On the Simplification of Generalized Conjugate 
Gradient Methods for Nonsymmetrizable Linear Systems,” Lin. Alg. and Its Applic. 
52/53, 399-417. 

V. Faber and T. Manteuffel (1984). “Necessary and Sufficient Conditions for the Exis- 
tence of a Conjugate Gradient Method,” SIAM J. Numer. Anal. 21 352-362. 

Y. Saad and M. Schultz (1985). “Conjugate Gradient-Like Algorithms for Solving Non- 
symmetric Linear Systems,” Math. Comp. 44, 417-424. 

H.A. Van der Vorst (1986). “An Iterative Solution Method for Solving f(A)z = 6 Using 
Krylov Subspace Information Obtained for the Symmetric Positive Definite Matrix 
A,” J. Comp. and App. Math. 18, 249-263. 

M.A. Saunders, H.D. Simon, and E.L. Yip (1988). “Two Conjugate Gradient-Type 
Methods for Unsymmetric Linear Equations,” SIAM J. Num. Anal. 25, 927-940. 

R. Freund (1992). “Conjugate Gradient-Type Methods for Linear Systems with Complex 
Symmetric Coefficient Matrices,” SIAM J, Sci. Statist. Comput. 13, 425—448. 


More Lanczas-based solvers are discussed in 


Y. Saad (1982). "The Lanczos Biorthogonalization Algorithm and Other Oblique Pro- 
jection Methods for Solving Large Unsymmetric Systems,” SIAM J. Numer. Anal. 
19, 485-506. 

Y. Saad (1987). “On the Lanczos Method for Solving Symmetric Systems with Several 
Right Hand Sides,” Math. Comp. 48, 651-662. 

C. Brezinski and H. Sadok (1991). “Avoiding Breakdown in the CGS Algorithm,” Nu- 
mer. Alg. 1, 199-206. 

C. Brezinski, M. Zaglia, and H. Sadok (1992). “A Breakdown Free Lanczos Type Algo- 
rithm for Solving Linear Systems,” Numer. Math. 63, 29-38. 

S.K. Kim and A.T. Chronopoulos (1991). “A Class of Lanczos-Like Algorithms Imple- 
mented on Parallel Computers,” Parallel Comput. 17, 763-778. 

W. Joubert (1992). “Lanczos Methods for the Solution of Nonsymmetric Systems of 
Linear Equations,” SIAM J. Matrix Anal. Appl. 13, 926-943. 

R.W. Freund, M. Gutknecht, and N. Nachtigal (1993). “An Implementation of the 
Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices,” SIAM J. Sci. and 
Stat. Comp. 14, 137-158. 


The QMR method is detailed in the following papers 


R.W. Freund and N. Nachtigal (1991). “QMR: A Quasi-Minimal Residual Method for 
Non-Hermitian Linear Systems," Numer. Math. 60, 315-339. 

R.W. Freund (1993). *A Transpose-Free Quasi-Minimum Residual Algorithm for Non- 
hermitian Linear System,” SIAM J. Sci. Comput. 14, 470-482. 

R.W. Freund and N.M. Nachtigal (1994). “An Implementation of the QMR Method 
Based on Coupled Two-term Recurrences,” SIAM J. Sci. Comp. 15, 313-337. 


The residuals in BiCG tend to display erratic behavior prompting the development of 
stabilizing techniques: 
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H. van der Vorst (1992). “BiCGSTAB: A Fast and Smoothly Converging Variant of the 
Bi-CG for the Solution of Nonsymmetric Linear Systems,” SIAM J. Sct. and Stat. 
Comp. 13, 631-644. 

M. Gutknecht (1993). “Variants of BICBSTAB for Matrices with Complex Spectrum," 
SIAM J. Sci. and Stat. Comp. 14, 1020-1033. 

G.L.G. Sleijpen and D.R. Fokkema (1993). *BICGSTAB(/) for Linear Equations In- 
volving Unsymmetric Matrices with Complex Spectrum,” Electronic Transactions 
on Numerical Analysis 1, 11-32. 

C. Brezinski and M. Redivo-Zaglia (1995). "Look-Ahead in BiCGSTAB and Other 
Product-Type Methods for Linear Systems," BIT 35, 169-201. 


In some applications it is awkward to produce matrix-vector product code for both Az 
and AT z. Transpose free methods are popular in this context. See 


P. Sonneveld (1989). “CGS, A Fast Lanczos-Type Solver for Nonsymmetric Linear Sys- 
tems,” SIAM J. Sci. and Stat. Comp. 10, 36—52. 

G. Radicati di Brozolo and Y. Robert (1989). *Parallel Conjugate Gradient-like Algo- 
rithms for Solving Sparse Nonsymmetric Linear Systems on a Vector Multiprocessor," 
Parallel Computing 11, 233-240. 

C. Brezinski and M. Redivo-Zaglia (1994). “Treatment of Near-Breakdown in the CGS 
Algorithms," Numerical Algorithms 7, 33-73. 

E.M. Kasenally (1995). “GMBACK: A Generalized Minimum Backward Error Algorithm 
for Nonsymmetric Linear Systems,” SIAM J. Sci. Comp. 16, 698-719. 

C.C. Paige, B.N. Parlett, and H.A. van der Vorst (1995). “Approximate Solutions and 
Eigenvalue Bounds from Krylov Subspaces," Num. Lin. Alg. with Applic. 2, 115- 
133. 

M. Hochbruck and Ch. Lubich (1996), *On Krylov Subspace Approximations to the 
Matrix Exponential Operator," SIAM J. Numer. Anal., to appear. 

M. Hochbruck and Ch. Lubich (1996), “Error Analysis of Krylov Method in a Nutshell,” 
SIAM J. Sci. Comput., to appear. 


Connections between the pseudoinverse of a rectangular matrix A and the conjugate 
gradient method applied to AT A are pointed out in the paper 


M. Hestenes (1975). “Pseudoinverses and Conjugate Gradients,” CACM 18, 40-43. 


Chapter 11 


Functions of Matrices 


§11.1 Eigenvalue Methods 
§11.2 Approximation Methods 
911.3 The Matrix Exponential 


Computing a function f(A) of an n-by-n matrix A is a frequently oc- 
curring problem in control theory and other application areas. Roughly 
speaking, if the scalar function f(z) is defined on A(A), then f(A) is de- 
fined by substituting “A” for “z” in the “formula” for f(z). For example, 
if f(z) = (1 + 2)/(1 — z) and 1 Z A(A), then f(A) = (I+ A)(I— A)! . 

The computations get particularly interesting when the function f is 
transcendental. One approach in this more complicated situation is to 
compute an eigenvalue decomposition A = Y BY ^! and use the formula 
f(A) = Yf(B)Y !. If B is sufficiently simple, then it is often possible 
to calculate f(B) directly. This is illustrated in 811.1 for the Jordan and 
Schur decompositions. Not surprisingly, reliance on the latter decomposi- 
tion results in a more stable f(A) procedure. 

Another class of methods for the matrix function problem is to approx- 
imate the desired function f(A) with an easy-to-calculate function g(A). 
For example, g might be a truncated Taylor series approximate to f. Error 
bounds associated with the approximation of matrix functions are given in 
811.2. 

In the last section we discuss the special and very important problem 
of computing the matrix exponential e^. 


Before You Begin 


Chapters 1, 2, 3, 7 and 8 are assumed. Within this chapter there are 
the following dependencies: 
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811.1 — $8112 — 8113 


Complementary references include Mirsky (1955), Gantmacher (1959), Bell- 
man (1969), and Horn and Johnson (1991). Some Matlab functions impor- 
tant to this chapter are expm, expm1, expm2, expm3, logm, sqrtm, and funn. 


11.1 Eigenvalue Methods 


Given an n-by-n matrix A and a scalar function f(z), there are several 
ways to define the matrix function f(A). A very informal definition might 
be to substitute “A” for "z" in the formula for f(z). For example, if p(z) 
= 1 + z and r(z) = (1 — (z/2))^! (1 + (z/2)) for z # 2, then it is certainly 
reasonable to define p(A) and r(A) by 


p(A) = I-A 


r(A) = (-2) (1+5) 24 XA). 


* A-for-z" substitution also works for transcendental functions, i.e., 


and 


To make subsequent algorithmic developments precise, however, we need a 
more precise definition of f( A). 


11.1.1 <A Definition 


There are many ways to establish rigorously the notion of a matrix function. 
See Rinehart (1955). Perhaps the most elegant approach is in terms of a 
line integral. Suppose f(z) is analytic inside on a closed contour I which 
encircles \(A). We define f(A) to be the matrix 


f(A) = zi f. IOG- Ay tae. (11.1.1) 


This definition is immediately recognized as a matrix version of the Cauchy 
integral theorem. The integral is defined on an element-by-element basis: 


F(A) = (fej) => fij = gai f. FORCI- AY ejdz. 


Notice that the entries of (zJ — A)~* are analytic on IT and that f(A) is 
defined whenever f(z) is analytic in a neighborhood of A( A). 
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11.1.2 The Jordan Characterization 


Although fairly useless from the computational point of view, the definition 
(11.1.1) can be used to derive more practical characterizations of f(A). For 
example, if f(A) is defined and 


A = XBX™ = Xdiag(B,,..., B)X |, B, c Qux» 
then it is easy to verify that 
f(A) = Xf(B)X^! = Xdiag(f(By),...,f(Bp))X~’. (11.1.2) 
For the case when the B; are Jordan blocks we obtain the following: 


Theorem 11.1.1 Let X^! AX = diag(J1,..., Jp) be the Jordan canonical 
form (JCF) of A € €"*" with 


Ac do xe Bs D 
0 À 1 : 
Ji -— . . . 
: : t. . 1 
E ee i S 


being an m;-by-m; Jordan block. If f(z) is analytic on an open set contain- 
ing A(A), then 


f(A) = Xdiag(f(Ji),..., (5) X! 


where 
(mi-1)(1. 
FM) FO) "E 
0 (As) 
f(J) = : : 
FO) 
0 Mot eae. e f) 


Proof. In view of the remarks preceding the statement of the theorem, it 
suffices to examine f(G) where 


G —-AI-E E=(6 5-1) 
is a q-by-q Jordan block. Suppose (z/ — G) is nonsingular. Since 


q-1 k 
E 
(zI — G) 2. (z— Xen 
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it follows from Cauchy’s integral theorem that 


q-1 k 
B 1 f(z) fi 2 EF. 
f(G) — L E r (z a Ay a2| z -EHO 
The theorem follows from the observation that E* = (6i j—k). O 


Corollary 11.1.2 If Ac C?*^, A = Xdiag(A1,..., Àn) X1, and f(A) is 
defined, then 
f(A) "e Xdiag( f (A1), "vw FOX. 


Proof. The Jordan blocks are all 1-by-1. L1 


These results illustrate the close connection between f(A) and the eigen- 
system of A. Unfortunately, the JCF approach to the matrix function 
problem has dubious computational merit unless A is diagonalizable with 
a well-conditioned matrix of eigenvectors. Indeed, rounding errors of order 
uk2(X ) can be expected to contaminate the computed result, since a lin- 
ear system involving the matrix X must be solved. The following example 
suggests that ill-conditioned similarity transformations should be avoided 
when computing a function of a matrix. 


Example 11.1.1 If 


_ [141075 1 
A= | 0 pp]: 


then any matrix of eigenvectors is a column scaled version of 


[1 -1 
m | 0 2(1- 1075) | 
and has a 2-norm condition number of order 105. Using a computer with machine 
precision u z 1077 we find 


FX- !diag(exp(1 + 1079), exp(1— 1075)) X] = | 


2.718307 2.750000 
0.000000 2.718254 


while 
AER 2.718309 2.718282 
0.000000 2.718255 


11.1.3 A Schur Decomposition Approach 


Some of the difficulties associated with the Jordan approach to the matrix 
function problem can be circumvented by relying upon the Schur decom- 
position. If A = QTQ is the Schur decomposition of A, then 


f(A) = QF(T)Q*. 


For this to be effective, we need an algorithm for computing functions of 
upper triangular matrices. Unfortunately, an explicit expression for f(T) 
is very complicated as the following theorem shows. 
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Theorem 11.1.3 Let T = (t;;) be an n-by-n upper triangular matriz with 
A; = tig and assume f(T) is defined. If f(T) = (fij), then fij = 0 ifi J, 
fü = f(Ai) for i = j, and for all i < j we have 


fig = > tao,a1ta1,82 tt PPM Ast: LE 
(80,...,8&)€ Sij 


where Sij is the set of all strictly increasing sequences of integers that start 
at i and end at j and f [A44,..., As,] is the kth order divided difference of 
fat P ee S 


Proof. See Descloux (1963), Davis (1973), or Van Loan (1975). O 


Computing f(T) via Theorem 11.1.3 would require O(2") flops. Fortu- 
nately, Parlett (1974) has derived an elegant recursive method for deter- 
mining the strictly upper triangular portion of the matrix F — f(T). It 
requires only 2n? /3 flops and can be derived from the following commutivity 
result: 

FT = TF. (11.1.3) 


Indeed, by comparing (i, j) entries in this equation, we find 


j j 
futs = Y taf 22i 


k=i kzi 


and thus, if ¢;; and £;; are distinct, 


j-1 

fij — fä tik fki — fiktkj 

fij = ui c. + I 
Jo k=i+l A oe 


(11.1.4) 


From this we conclude that f;; is a linear combination of its neighbors to its 
left and below in the matrix F. For example, the entry fos depends upon 
foo, foa, f24, fs5, fas, and fas. Because of this, the entire upper triangular 
portion of F can be computed one superdiagonal at a time beginning with 
the diagonal, f (611), .., f(tan). The complete procedure is as follows: 


Algorithm 11.1.1 This algorithm computes the matrix function F = 
f(T) where T is upper triangular with distinct eigenvalues and f is defined 
on A(T). 


for i= i:n 
fa = f(tà) 


end 
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for p=1:n-—-1 
for i = l:in — p 
j=i+p 
8 = tü(fsj — fii) 
for k=i1+1:37-1 
8 = S + titles — fiktkj 
end 
fij = s/(tjj — ti) 
end 
end 


This algorithm requires 2n3/3 flops. Assuming that T = QAQ* is the 
Schur form of A, f(A) = QFQP where F = f(T). Clearly, most of the 
work in computing f(A) by this approach is in the computation of the 
Schur decomposition, unless f is extremely expensive to evaluate. 


-[ 


and f(z) = (1 + z)/z then F = (fij) = f(T) is defined by 


Example 11.1.2 If 


hi = (1+1)/1=2 

fa = (143)/3=4/3 

fsa = (14+ 5)/5 = 6/5 

fi2 = tie(fe2 — fi1)/(t22 — t) = —2/3 

fas =  tas(fas — f22)/(t33 — t22) = —4/15 

fis = [tis(fs3 -— fir) + (Gafas — f12t23)]/(t33 — t11) = —1/15. 


11.1.4 A Block Schur Approach 


If A has close or multiple eigenvalues, then Algorithm 11.1.1 leads to poor 
results. In this case, it is advisable to use a block version of Algorithm 
11.1.1. We outline such a procedure due to Parlett (1974a). The first 
step is to choose Q in the Schur decomposition such that close or multiple 
eigenvalues are clustered in blocks Tj;,..., 75, along the diagonal of T. In 
particular, we must compute a partitioning 


Ta Tig ++: Ty Fu Fc Fip 

0 Too + Top 0 Fon e Fop 
T = i EK F = m 

0 0 im Ty D. d s Ba 


where A(Ti;) O A(T55) £ 0, i # j. The actual determination of the block 
sizes can be done using the methods of $7.6. 
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Next, we compute the submatrices Fi, = f(T;;) for i = l:p. Since the 
eigenvalues of T;; are presumably close, these calculations require special 
methods. (Some possibilities are discussed in the next two sections.) Once 
the diagonal blocks of F are known, the blocks in the strict upper triangle 
of F can be found recursively, as in the scalar case. To derive the governing 
equations, we equate (2,7) blocks in FT = TF for i < j and obtain the 
following generalization of (11.1.4): 


j-1 
FgTj; Ta4F = Ti Fig  Fali + Y (Tit Fh;— FuTkj). (11.1.5) 
k=i+1 


This is a linear system whose unknowns are the elements of the block F;; 
and whose right-hand side is “known” if we compute the Fj; one block 
super-diagonal at a time. We can solve (11.1.5) using the Bartels-Stewart 
algorithm (Algorithm 7.6.2). 

The block Schur approach described here is useful when computing real 
functions of real matrices. After computing the real Schur form A = QTQ’, 
the block algorithm can be invoked in order to handle the 2-by-2 bumps 
along the diagonal of T. 


Problems 

P11.1.1 Using the definition (11.1.1) show that (a) Af(A) = f(A)A, (b) f(A) is upper 
triangular if A is upper triangular, and (c) f(A) is Hermitian if A is Hermitian. 
P11.1.2 Rewrite Algorithm 11.1.1 so that f(T) is computed column by column. 


P11.1.8 Suppose A = Xdiag(Ai)X~! where X =[21,...,2n] and X^! =[y1,...,yn J”. 
Show that if f(A) is defined, then 


f(A) = Y fos 
k=1 


P11.1.4 Show that 


Ti1 Tha p Fu Fie | D 
T = => T) = 
| 0 Tn | q f(T) | 0 Foe q 


p q D q 
where Fy) = f(Ti1) and F33 = f(T22). Assume f(T) is defined. 


Notes and References for Sec. 11.1 


The contour integral representation of f{A) given in the text is useful in functional anal- 
ysis because of its generality. See 


N. Dunford and J. Schwertz (1958). Linear Operators, Part I, Interscience, New York. 


As we discussed, other definitions of f(A) are possible. However, for the matrix functions 
typically encountered in practice, all these definitions are equivalent. See 
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R.F. Rinehart (1955). “The Equivalence of Definitions of a Matric Function,” Amer. 
Math. Monthly 62, 395-414. 


Various aspects of the Jordan representation are detailed in 


J.S. Frame (1964). “Matrix Functions and Applications, Part IL" IEEE Spectrum 1 
(April), 102-8. 

J.S. Frame (1964). “Matrix Functions and Applications, Part IV," IEEE Spectrum 1 
( June), 123-31. 


The following are concerned with the Schur decomposition and its relationship to the 
f(A) problem: 


D. Davis (1973). “Explicit Functional Calculus," Lin. Alg. and Its Applic. 6, 193-99. 

J. Descloux (1963). “Bounds for the Spectral Norm of Functions of Matrices,” Numer. 
Math. 5, 185-90. 

C.F. Van Loan (1975). “A Study of the Matrix Exponential,” Numerical Analysis Report 
No. 10, Dept. of Maths., University of Manchester, England. 


Algorithm 11.1.1 and the various computational difficulties that arise when it is applied 
to a matrix having close or repeated eigenvalues are discussed in 


B.N. Parlett (1976). “A Recurrence Among the Elements of Functions of Triangular 
Matrices,” Lin. Alg. and Its Applic. 14, 117-21. 


A compromise between the Jordan and Schur approaches to the f(A) problem results if 
A is reduced to block diagonal form as described in §7.6.3. See 


B. Kágstróm (1977). “Numerical Computation of Matrix Functions,” Department of 
Information Processing Report UMINF-58.77, University of Umea, Sweden. 


The sensitivity of matrix functions to perturbation is discussed in 


C.S. Kenney and A.J. Laub (1989). “Condition Estimates for Matrix Functions,” SIAM 
J. Matriz Anal. Appi. 10, 191—209. 

C.S. Kenney and A.J. Laub (1994). *Small-Sample Statistical Condition Estimates for 
General Matrix Functions," SJAM J. Sci. Comp. 15, 36-61. 


A theme in this chapter is that if A is nonnormal, then there is more to computing f(A) 
than just computing f(z) on A(A). The pseudo-eigenvalue concept is & way of under- 
standing this phenomena. See 


L.N. Trefethen (1992). “Pseudospectra of Matrices," in Numerical Analysis 1991, D.F. 
Griffiths and G.A. Watson (eds), Longman Scientific & Technical, Harlow, Essex, 
UK. 


More details are offered in §11.3.4. 


11.2 Approximation Methods 


We now consider a class of methods for computing matrix functions which at 
first glance do not appear to involve eigenvalues. These techniques are based 
on the idea that if g(z) approximates f(z) on A(A), then f (A) approximates 
g{A), e.g., 


A? A1 
e^ c I+ Atte LL. 
2! g! 
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We begin by bounding || f(A) — g(A) || using the Jordan and Schur matrix 
function representations. We follow this discussion with some comments 
on the evaluation of matrix polynomials. 


11.2.1 A Jordan Analysis 


The Jordan representation of matrix functions (Theorem 11.1.1) can be 
used to bound the error in an approximant g(A) of f(A). 


Theorem 11.2.1 Let X! AX = diag(Ji,...,Jp) be the JCF of Ae ("*" 
with 


Je Ho n ces dB 
0 AX 1 : : 
J; — : 
: : : ty l 
Ü. ses, ae. Ger AN 


being an m,-by-m, Jordan block. If f(z) and g(z) are analytic on an open 
set containing (A), then 


max MO) — gP (A)| 
1<i<p + r! 
0<r<m-l 


ll F(A) — g{A) lla < Ka(X) 


t 


Proof. Defining h(z) = f(z) — g(z) we have 


il f(A) — «(A) fle 


| Xdiag(h(Jz),...,h(Jp))X~* |l 


I^ 


K2(X) es | ^(Ji) lle « 


Using Theorem 11.1.1 and equation (2.3.8) we conclude that 


[ROA | 


0<r<¢m;-1 r! 


| R(Ji) lla € mi 


thereby proving the theorem. O 


11.2.2 A Schur Analysis 


If we rely on the Schur instead of the Jordan decomposition we obtain an 
alternative bound. 
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Theorem 11.2.2 Let QU AQ = T = diag(A;) + N be the Schur decompo- 
sition of A € C"*", with N being the strictly upper triangular portion of 
T. If f(z) and g(z) are analytic on a closed conver set Q whose interior 
contains A( A), then 


F(A) - (0 lle < yo UNT e UR 


where 


by = sup f(z) = gf? ()| . 
zc 


Proof. Let A(z) = f(z) — g(z) and set H = (hy;) = h(A). Let S denote 


the set of strictly increasing integer sequences (so,...,5,) with the property 
that so = i and s, = j. Notice that 


and so from Theorem 11.1.3, we obtain the following for all 4 < 3: 


ngos 3 2 710,5, 731,82 7 7 a, us, P Aus As ]- 


r=] (r) 
s€S;, 


Now since 2 is convex and ^ analytic, we have 
Ih... A] € sup —— = —. (11.2.1) 
Furthermore if jN|"— (n£?) for r > 1, then it can be shown that 
0 g<atr 


n. = 
* X | Piaget fis di Mae io 


(r) 
s€Si; 


(11.2.2) 


j2tt+r 


The theorem now follows by taking absolute values in the expression for 
hi; and then using (11.2.1) and (11.2.2). O 


The bounds in the above theorems suggest that there is more to approximat- 
ing f(A) than just approximating f(z) on the spectrum of A. In particular, 
we see that if the eigensystem of A is ill-conditioned and/or A’s departure 
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from normality is large, then the discrepancy between f(A) and g(A) may 
be considerably larger than the maximum of | f(z) — g(z)| on A(A). Thus, 
even though approximation methods avoid eigenvalue computations, they 
appear to be influenced by the structure of A’s eigensystem, a point that 
we pursue further in the next section. 


—.01 1 1 
A = 0 0 14- 
0 0 .01 


If f(z) = e? and g(z) = 1+ z + 27/2, then || f(A) — g(A) | = 1075 in either the 
Frobenius norm or the 2-norm. Since «2(X) = 107, the error predicted by Theorem 
11.2.1 is O(1), rather pessimistic. On the other hand, the error predicted by the Schur 
decomposition approach is O(10-2). 


Example 11.2.1 Suppose 


11.2.8 Taylor Approximants 


A popular way of approximating a matrix function such as e^ is through 
the truncation of its Taylor series. The conditions under which a matrix 
function f(A) has a Taylor series representation are easily established. 


Theorem 11.2.3 Jf f(z) has a power series representation 
oo 
f(z) = Maz 
k=0 


on an open disk containing A(A}, then 
oo 
f(A) = Mia AF. 
k=0 


Proof. We prove the theorem for the case when A is diagonalizable. In 
P11.2.1, we give a hint as to how to proceed without this assumption. 
Suppose X"! AX = D = diag(A,,..., Àn). Using Corollary 11.1.2, we 
have 


f(A) Xdiag ( f(A1),---,f(An)) X^" 


X diag ( Y ceAF,... Y sot) x^! 


=0 k=0 


= X (Ean) X 3 e(XDX-)y* = y atn 
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Several important transcendental matrix functions have particularly simple 
series representations: 


iste k 
log(I — A) = = JA, <1, A € ACA) 


(à = YA AL 
sin(A) = —1 
ar (2k + 1)! 
o0 Atk 


The following theorem bounds the errors that arise when matrix functions 
such as these are approximated via truncated Taylor series. 


Theorem 11.2.4 If f(z) has the Taylor series 


f(z) = X arz" 
k=0 


on an open disk containing the eigenvalues of A € C?*", then 


| (A )- raat l2 < cp en D Ange) (AS) fa. 
Proof. Define the matrix E(s) by 


q 
f(As = X ak(As) + E(s) O<s<1. (11.2.3) 
k=0 
If f;;(s) is the (i, j) entry of f( As), then it is necessarily analytic and so 


q [o 0 (qt1) 7. 
fij(s) = (x4 E 2 EN e (11.2.4) 


k=0 


where Eij satisfies 0 < £ij «S s € ]. 
By comparing powers of s in (11.2.3) and (11.2.4) we conclude that 
€;;(s), the (2,7) entry of E(s), has the form 
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Now T E (s) is the (i, j) entry of A?t+! f(«*U (As) and therefore 


$1) || Ag+! f* (As) |l 
enli < max —L——— < max H. 
leu()] < o<s<1 (G@+1)! ^" pesca (q+ 1)! 


The theorem now follows by applying (2.3.8). LI 


Example 11.2.2 If 


-49 24 
idm | —64 n 


then 


—1.471518 1.103638 
For g = 59, Theorem 11.2.4 predicts that 


eÂ | —0.735759 — 0551819 | 


q 
A A* n 415A —60 
le^- 5 dr lh < erp Ae aes. 
k=0 RA ad 


However, if u = 1077, then we find 


59 
fi y AF|  [ -2225880 —1.4322766 
k! - | -61.49031  —3474280 | © 
k=0 

The problem is that some of the partial sums have large elements. For example, I +--+: + 
AT /17! has entries of order 107. Since the machine precision is approximately 1077, 


rounding errors larger than the norm of the solution are sustained. 


Example 11.2.2 highlights a shortcoming of truncated Taylor series approx- 
imation: It tends to be worthwhile only near the origin. The problem can 
sometimes be circumvented through a change of scale. For example, by 
repeated application of the double angle formulae: 


cos(2A) = 2cos(A)? — I sin(2A) = 2sin( A) cos(A) 


it is possible to “build up” the sine and cosine of a matrix from suitably 
truncated Taylor series approximates: 


So = Taylor approximate to sin(A/2*) 
Co = Taylor approximate to cos(A/2*) 
for j = Lk 

Sj = 25, 1Cj-1 

Cj = 2C? | — I 
end 


Here k is a positive integer chosen so that, say, || A ||oo = 2*. See Serbin 
and Blalock (1979). 
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11.2.4 Evaluating Matrix Polynomials 


Since the approximation of transcendental matrix functions so often in- 
volves the evaluation of polynomials, it is worthwhile to look at the details 
of computing 

p(A) = bof +6,A+---+4+6,A% 


where the scalars bo,...,, € IR are given. The most obvious approach is 
to invoke Horner’s scheme: 


Algorithm 11.2.1 Given a matrix A and 6(0:q), the following algorithm 
computes F = by A%+---+6,A+ bol. 
F= b, A + boil 
for k—24—2:— 1:0 
F=AF+ I 
end 


This requires q — 1 matrix multiplications. However, unlike the scalar case, 
this summation process is not optimal. To see why, suppose q = 9 and 
observe that 


p(A) = A*(A* (b9 A? + (bA? + bz A + be1)) 
+(bsA? + b4A -- b41)) + 05A? + b, A + bol. 
Thus, F = p(A) can be evaluated with only four matrix multiplies: 
Ay = A? 
Az; = AAg 
Fi, = bgAq+bgAo+ 607A + bel 
Fy = As3F, +b5A2 + b4A + b3I 
F = AS F, + ba Ao + b, A + bol. 


In general, if s is any integer satisfying 1 < s < y/q then 


p(A) = 3^8, (A rz foor(g/s) (11.2.5) 
=0 


where 


bekaa—1 At” | +--+ bsk+1 A + bal k=0:r-1 
bg A979 dE se t bere 1 At bI k =r. 


Once A?,..., À* are computed, Horner’s rule can be applied to (11.2.5) 
and the net result is that p(A) can be computed with s + r — 1 matrix 
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multiplies. By choosing s = floor(./g), the number of matrix multiplies 
is approximately minimized. This technique is discussed in Paterson and 
Stockmeyer (1973). Van Loan (1978) shows how the procedure can be 
implemented without storage arrays for A?,..., A*. 


11.2.5 Computing Powers of a Matrix 


The problem of raising a matrix to a given power deserves special mention. 
Suppose it is required to compute A!?. Noting that Af = (A*)*, A5 = 
(A3)? and A!3 = AS A14, we see that this can be accomplished with just 5 
matrix multiplications. In general we have 


Algorithm 11.2.2 (Binary Powering) Given a positive integer s and 
A € IR***^, the following algorithm computes F = A* where s is a positive 
integer and A € IR?*^, 


t 
Let s = »» fx2* be the binary expansion of s with f, Z 0. 
k= 

Z=A;q is 0 
while 5, = 0 

Z2Z^5q-2q-1 
end 
Ii 
for k=q+l1:t 

Z= 2? 

if 9, #0 

F=FZ 

end 

end 


This algorithm requires at most 2 floor{log.(s)] matrix multiplies. If s is a 


power of 2, then only log.(s) matrix multiplies are needed. 


11.2.6 Integrating Matrix Functions 


We conclude this section with some remarks on the integration of matrix 
functions. Suppose f(At) is defined for all t € [a,b] and that we wish to 
compute 


b 
F- f f(At)dt. 


As in (11.1.1) the integration is on an element-by-element basis. 
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Ordinary quadrature rules can be applied to F. For example, with 
Simpson’s rule, we have 


> des 
FeFs PI wef(Ala + kh) (11.2.6) 


where m is even, h = (b — a)/m and 


1 k-0,m 
Wk — 4 kodd 
2 keven,k #0,m. 


If (d*/dz*)f(zt) = f(9(zt) is continuous for t € [a,b] and if f(9 (At) is 
defined on this same interval, then it can be shown that F = F + E where 


T" | 
nh'(b—a) x | f°) (At) Ila. (11.2.7) 
180 act«b 


| E lle < 
Let fj; and e;; denote the (i, j) entries of F and E, respectively. Under the 
above assumptions we can apply the standard error bounds for Simpson's 
rule and obtain 


h*(b — a) 


"T 
leal S —1g5 


max |e; f? (At)e;] . 
«t«b 


The inequality (11.2.7) now follows since || E |; < n mex [e;;| and 


max |e} f(At)e;| < max || f(9 (At) lz. 
a<t<b a<t<b 


Of course, in the practical application of (11.2.6), the function evaluations 
f(A(a + kh)) normally have to be approximated. Thus, the overall error 
involves the error in approximating f(A(a+kh) as well as the Simpson rule 
error. 


Problems 


P11.2.1 (a) Suppose G = AI + E is a p-by-p Jordan block, where E = (6,,; 1). Show 
that 
min{p-1,k} " 
M+E)* = | ) \F-3 EL. 
arate E (jp 
j=0 

(b) Use (a) and Theorem 11.1.1 to prove Theorem 11.2.3. 
P11.2.2 Verify (11.2.2). 


P11.2.3 Show that if |] A|l < 1, then log(I + A) exists and satisfies the bound 
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I| log + A) l2 € || A ll2/(1 — |] A fl). 

P11.2.4 Let A by an n-by-n symmetric positive definite matrix. (a) Show that there 
exists a unique symmetric positive definite X such that A = X?. (b) Show that if 
Xo = I and Xk41 = (Xk + AX, *)/2 then X, — VA quadratically where VA denotes 
the matrix X in part (a). 

P11.2.5 Specialize Algorithm 11.2.1 to the case when A is symmetric. Repeat for the 
case when A is upper triangular. In both instances, give the associated flop counts. 
P11.2.6 Show that X(t) = Ci cos(tWA) + CoV A`! sin(tV/A) solves the initial value 
problem X(t) = -AX (t), X(0) = C1, X(0) = Ca. Assume that A is symmetric positive 
definite. 

P11.2.7 Using Theorem 11.2.4, bound the error in the approximations: 


q 
" y, AF 
sin(A) m 2 D Oe cos(A) = 2 1) Qi 


P11.2.8 Suppose A € R?** is nonsingular and Xo € R?” *” is given. The iteration 
defined by 
Xk41 = X&(21 — AX&) 


is the matrix analog of Newton's method applied to the function f(z) = a — (1/z). Use 
the SVD to analyze this iteration. Do the iterates converge to A^!? Discuss the choice 
of Xo. 


Notes and References for Sec. 11.2 


The optimality of Horner's rule for polynomial evaluation is discussed in 


D. Knuth (1981). The Art of Computer Programming , vol. 2. Seminumerical Algo- 
rithms , 2nd ed., Addison- Wesley, Reading, Massachusetts. 

M.S. Paterson and L.J. Stockmeyer (1973). *On the Number of Nonscalar Multiplica- 
tions Necessary to Evaluate Polynomials,” S7AM J. Comp. 2, 60-66. 


The Horner evaluation of matrix polynomials is analyzed in 


C.F. Van Loan (1978). “A Note on the Evaluation of Matrix Polynomials,” IEEE Trans. 
Auto. Cont. AC-24, 320-21. 


Other aspects of matrix function computation are discussed in 


N.J. Higham and P.A. Knight (1995). “Matrix Powers in Finite Precision Arithmetic,” 
SIAM J. Matriz Anal. Appl. 16, 343-358. 

R. Mathias (1993). “Approximation of Matrix-Valued Functions," SIAM J. Matriz Anal. 
Appl. 14, 1061-1063. 

S. Friedland (1991). “Revisiting Matrix Squaring,” Lin. Alg. and Its Applic. 154-156, 
59-63. 

H. Bolz and W. Niethammer (1988). “On the Evaluation of Matrix Functions Given by 
Power Series,” SIAM J. Matriz Anal. Appl. 9, 202-209. 


The Newton and Language representations for f(A) and their relationship to other ma- 
trix function definitions is discussed in 


R.F. Rinehart (1955). “The Equivalence of Definitions of a Matric Function,” Amer. 
Math. Monthly 62, 395-414. 
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The “double angle" method for computing the cosine of matrix is analyzed in 


S. Serbin and S. Blalock (1979). “An Algorithm for Computing the Matrix Cosine,” 
SIAM J. Sci. Stat. Comp. 1, 198-204. 


The square root is a particularly important matrix function. See 84.2.10. Several ap- 
proaches are possible: 


A. Bjórck and S. Hammarling (1983). “A Schur Method for the Square Root of a Matrix,” 
Lin. Alg. and Its Applic. 52/53, 127-140. 

N.J. Higham (1986). “Newton’s Method for the Matrix Square Root,” Math. Comp. 
46, 537-550. 

N.J. Higham (1987). “Computing Real Square Roots of a Real Matrix,” Lin. Alg. and 
Its Applic. 88/89, 405—430. 


11.3 The Matrix Exponential 


One of the most frequently computed matrix functions is the exponential 


Numerous algorithms for computing e^* have been proposed, but most of 
them are of dubious numerical quality, as is pointed out in the survey article 
by Moler and Van Loan (1978). In order to illustrate what the computa- 
tional difficulties are, we present a "scaling and squaring" method based 
upon Padé approximation. A brief analysis of the method follows that in- 
volves some e^t perturbation theory and comments about the shortcomings 
of eigenanalysis in settings where non-normality prevails. 


11.3.1 <A Padé Approximation Method 


Following the discussion in §11.2, if g(z) = e7, then g(A) = e^. A very 
useful class of approximants for this purpose are the Padé functions defined 


Ryq(2) = Dyq(2)~"Npa(2), 


p+ q — k)ip! 
PAE) Perd d "a 


and 


_ (p +q — k)!q! 
Dy(z) = ere cer ae 


Notice that R,o(z) = 1+z+---+4+2?/p! is the pth order Taylor polynomial. 
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Unfortunately, the Padé approximants are good only near the origin, as 
the following identity reveals: 


A = RA(A)- A D) Apte p p (A)! J uP -u)te^0-du, (11.3.1) 
(p+ q)! 0 


However, this problem can be overcome by exploiting the fact that e^ — 
(e^/y^. In particular, we can scale A by m such that F,,— Rj,(A/m) 
is a suitably accurate approximation to e4/™. We then compute Fp using 
Algorithm 11.2.2. If m is a power of two, then this amounts to repeated 
squaring and so is very efficient. The success of the overall procedure de- 
pends on the accuracy of the approximant 


A 2 
Fw = (mz) - 
In Moler and Van Loan (1978) it is shown that if 


Allo < 1 
2 ~ 2 


then there exists an E € IR"*" such that 


Fy = e^tE 

AE = EA 
EF ilo < elpali A lho 
e(p,q) = 93- (pc) pig! 


(p+q)'(p+q+1)!- 


These results form the basis of an effective e^ procedure with error control. 
Using the above formulae it is easy to establish the inequality: 


A. 
le -Erle < epp A facet Pal A ls 
| e^ Theo 


The parameters p and q can be determined according to some relative 
error tolerance. Note that since F,, requires about j + max(p,q) matrix 
multiplies it makes sense to set p = q as this choice minimizes e(p, q) for a 
given amount of work. Encapsulating these ideas we obtain 


Algorithm 11.3.1 Given 6 > 0 and A € IR"*", the following algorithm 
computes F = e^** where || E llo < 6l| A llo 


j = max(0, 1 + floor(log;(|| A lo))) 
A= A/2) 
Let q be the smallest non-negative integer such that «(q,q) < 6. 
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D=1; N=1; X=1;,c=1 
for k = 1:q 
e=c(q—k+1)/[(2g — k + 1)k] 
X= AX;N=N+cX; D=D+(-1)*cX 
end 
Solve DF = N for F using Gaussian elimination. 


end 


This algorithm requires about 2(¢ + j + 1/3)n?flops. The roundoff error 
properties of have essentially been analyzed by Ward (1977). 

The special Horner techniques of §11.2 can be applied to quicken the 
computation of D = Dg (A) and N = N,,(A). For example, if q = 8 we 
have N,,(A) = U + AV and D, (A) = U — AV where 


U = col + cg A? + (cal + cg A? + c5 At) A 


and 

V = ql + 3A? + (esI + c7 A”) A‘. 
Clearly, N and D can be found in 5 matrix multiplies rather than the 7 
required by Algorithm 11.3.1. 


11.8.2 Perturbation Theory 


Is Algorithm 11.3.1 stable in the presence of roundoff error? To answer this 
question we need to understand the sensitivity of the matrix exponential to 
perturbations in A. The starting point in the discussion is the initial value 
problem 

X(t) = AX() X(0) =I 


where A, X(t) € IR"*". This has the unique solution X(t) = e^', a char- 
acterization of the matrix exponential that can be used to establish the 
identity 
t 
e At Et = e^t - | eA -9 Epe( ^t Bag, 
0 
From this it follows that 


| e(^*E: — e^t Ijz LE Ji 

|| e* ([2 - de^ ll 

Further simplifications result if we bound the norms of the exponentials 
that appear in the integrand. One way of doing this is through the Schur 


decomposition. If Q" AQ = diag(\;)+ N is the Schur decomposition of 
A € C"**. then it can be shown that 


e^ llo < ex (Ms Oo, (11.8.2) 


t 
| | e^€-9 Ip || 4*2» Ios. 
0 
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where 
a(A) = max {Re(A): A € A(A) } (11.3.3) 
and xin : 
Ms(t) = | ai il 
k=0 


The quantity a@(A) is called the spectral abscissa and with a little manipu- 
lation it can be shown that 
| el(AtTE)t — e^t lla 


| e% Ifa 


Notice that Ms(t) = 1 if and only if A is normal, suggesting that the matrix 
exponential problem is *well behaved" if A is normal. This observation 
is confirmed by the behavior of the matrix erponenttal condition number 
v(A, t), defined by 


< t| E laMs(t) exp(tMs(t)| E lla) . 


LA fle 
2 lle% lla 


This quantity, discussed in Van Loan (1977), measures the sensitivity of 
the map A — e^t in that for a given t, there is a matrix E for which 


|| e(4** 2 — eM |; | E liz 


pe MATa 


Thus, if v(A,t) is large, small changes in A can induce relatively large 
changes in e^t. Unfortunately, it is difficult to characterize precisely those 
A for which v(A,t) is large. (This is in contrast to the linear equation 
problem Az = b, where the ill-conditioned A are neatly described in terms 
of SVD.) One thing we can say, however, is that v(A,t) > t|| Allo, with 
equality holding for all non-negative ? if and only if A is normal. 

Dwelling a little more on the effect of non-normality, we know from the 
analysis of $11.2 that approximating e^t involves more than just approxi- 
mating e** on A(A). Another clue that eigenvalues do not “tell the whole 
story” in the e^t problem has to do with the inability of the spectral ab- 
scissa (11.3.3) to predict the size of || e^ |; as a function of time. If A is 
normal, then 


v(A,t) — Max 


t 
f e^ t7» peAsqs 
| E las: Myo 


| e^* || = eo (11.3.4) 


Thus, there is uniform decay if the eigenvalues of A are in the open left half 
plane. But if A is non-normal, then e^* can grow before decay "sets in." 
The 2-by-2 example 


.[-1 M a a[l tM 
4-1 Ale ee E ri 


plainly illustrates this point. 
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11.3.3 Some Stability Issues 


With this discussion we are ready to begin thinking about the stability of 
Algorithm 11.3.1. A potential difficulty arises during the squaring process 
if A is a matrix whose exponential prows before it decays. If 


A j 
G = Ryu (5) melt, 
then it can be shown that rounding errors of order 
j-1 
y = ull G? ll G* lll G9 ll2--- 1G" Ile 


can be expected to contaminate the computed G7 . If || e^t ||; has a sub- 
stantial initial growth, then it may be the case that 


y > ull G?” |; = ull e^ lla 


thus ruling out the possibility of small relative errors. 

If A is normal, then so is the matrix G and therefore || G"* ||; = || G 17 
for all positive integers m. Thus, y = u|| G? || =~ ul] e^ || and so the 
initial growth problems disappear. The algorithm can essentially be guar- 
anteed to produce small relative error when A is normal. On the other 
hand, it is more difficult to draw conclusions about the method when A is 
non-normal because the connection between v( A, t) and the initial growth 
phenomena is unclear. However, numerical experiments suggest that Algo- 
rithm 11.3.1 fails to produce a relatively accurate e^ only when v(A, 1) is 
correspondingly large. 


11.3.4 Eigenvalues and Pseudo-Eigenvalues 


We closed $7.1 with à comment that the eigenvalues of a matrix are gen- 
erally not good "informers" when it comes to measuring nearness to sin- 
gularity, unless the matrix is normal. It is the singular values that shed 
light on Az = b sensitivity. Our discussion of the matrix exponential is 
another warning to the same effect. The spectrum of a non-normal A does 
not completely describe e^* behavior. 

In many applications, the eigenvalues of a matrix "say something" about 
an underlying phenomenon that is being modeled. If the eigenvalues are 
extremely sensitive to perturbation, then what they say can be misleading. 
This has prompted the development of the idea of pseudospectra. For e > 0, 
the e-pseudospectrum of a matrix A is a subset of the complex plane defined 
by 


A,(A) = {2 EC: || (zI-— A)? |lo> :] (11.3.5) 
Qualitatively, z is a pseudo-eigenvalue of A if zI — A is sufficiently close to 


singular. By convention we set Ao(.A) = A(A). Here are some pseudospectra 
properties: 
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1. If €} € €2, then Ne, (A) C Ae, (A). 
2. (4) = {z € ©: Omin(zI A) <e}. 
3. XA) = {z € €:2 € A(A+ E),for some E with || E |l2 € € }. 


Plotting the pseudospectra of a non-normal matrix A can provide insight 
into behavior. Here “behavior” can mean anything from the mathematical 
behavior of an iteration to solve Az = b to the physical behavior predicted 
by a model that involves A. See Higham and Trefethen (1993), Nachtigal, 
Reddy, and Trefethen (1992), and Trefethen, Trefethen, Reddy, and Driscoll 
(1993). 


Problems 


P11.8.1 Show that e(At8)t — eAteBt for all t if and only if AB = BA. (Hint: Express 
both sides as a power series in t and compare the coefficient of t.) 


P11.3.2 Suppose that A is skew-symmetric. Show that both e^ and the (1,1) Padé 
approximate R311(AÀ) are orthogonal. Are there any other values of p and g for which 
Rpg({ A) is orthogonal? 


P11.3.3 Show that if A is nonsingular, then there exists a matrix X such that A = e*. 
Is X unique? 


P11.3.4 Show that if 
-AT P - Fi Fi n 
exp (| 0 ADE | 0 F 22 n 
7" 7" 


then n 
T 
FI Fiz = f e^ tPe^tdt. 
0 


P11.3.5 Give an algorithm for computing e^ when A = uv, u,v c R^. 
P11.3.6 Suppose A € R"*" and that v € R” has unit 2-norm. Define the function 


P(t) = || e^*v 3/2 and show that 

P(t) € wl A) AE) 
where p(A) = M((A + AT)/2). Conclude that || e^* || < e#(4)t where t > 0. 
P11.3.7 Prove the three pseudospectra properties given in the text. 


Notes and References for Sec. 11.3 


Much of what appears in this section and an extensive bibliography may be found in the 
following survey article: 


C.B. Moler and C.F. Van Loan (1978). “Nineteen Dubious Ways to Compute the Expo- 
nential of a Matrix,” SIAM Review 20, 801-36. 


Scaling and squaring with Padé approximants (Algorithm 11.3.1) and a careful imple- 
mentation of Parlett’s Schur decomposition method (Algorithm 11.1.1) were found to be 
among the Jess dubious of the nineteen methods scrutinized. Various aspects of Padé 
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approximation of the matrix exponential are discussed in 


W. Fair and Y. Luke (1970). “Padé Approximations to the Operator Exponential,” 
Numer. Math. 14, 379-82. 

C.F. Van Loan (1977). “On the Limitation and Application of Padé Approximation to 
the Matrix Exponential,” in Padé and Rational Approrimation, ed. E.B. Saff and 
R.S. Varga, Academic Press, New York. 

R.C. Ward (1977). “Numerical Computation of the Matrix Exponential with Accuracy 
Estimate,” SIAM J. Num. Anal. 14, 600-14. 

A. Wragg (1973). “Computation of the Exponential of a Matrix I: Theoretical Consid- 
erations,” J. Inst. Math. Applic. 11, 369-75. 

A. Wragg (1975). “Computation of the Exponential of a Matrix II: Practical Consider- 
ations,” J. Inst. Math. Applic. 15, 273-78. 


A proof of equation (11.3.1) for the scalar case appears in 


R.S. Varga (1961). “On Higher-Order Stable Implicit Methods for Solving Parabolic 
Partial Differential Equations,” J. Math. Phys. 40, 220-31. 


There are many applications in control theory calling for the computation of the ma- 
trix exponential. In the linear optimal regular problem, for example, various integrals 
involving the matrix exponential are required. See 


J. Johnson and C.L. Phillips (1971). *An Algorithm for the Computation of the Integral 
of the State Transition Matrix," IEEE Trans. Auto. Cont. AC-16, 204—5. 

C.F. Van Loan (1978). “Computing Integrals Involving the Matrix Exponential,” IEEE 
Trans. Auto. Cont. AC-23, 395-404. 


An understanding of the map A — exp(At) and its sensitivity is helpful when assessing 
the performance of algorithms for computing the matrix exponential. Work in this di- 
rection includes 


B. Kágstróm (1977). “Bounds and Perturbation Bounds for the Matrix Exponential,” 
BIT 17, 39-57. 

C.F. Van Loan (1977). “The Sensitivity of the Matrix Exponential,” SIAM J. Num. 
Anal. 14, 971-81. 

R. Mathias (1992). “Evaluating the Frechet Derivative of the Matrix Exponential,” 
Numer. Math. 63, 213-226. 


The computation of a logarithm of a matrix is an important area demanding much more 
work. These calculations arise in various “system identification” problems. See 


B. Singer and S. Spilerman (1976). "The Representation of Social Processes by Markov 
Models," Amer. J. Sociology 82, 1—54. 
B.W. Helton (1968). “Logarithms of Matrices,” Proc. Amer. Math. Soc. 19, 733-36. 


For pointers into the pseudospectra literature we recommend 


L.N. Trefethen (1992). “Pseudospecta of Matrices,” in Numerical Analysis 1991, D.F. 
Griffiths and G.A. Watson (eds), Longman Scientific and Technical, Harlow, Essex, 
UK, 234-262. 

D.J. Higham and L.N. Trefethen (1993). "Stiffness of ODES," BIT 33, 285-303. 

L.N. Trefethen, A.E. Trefethen, S.C. Reddy, and T.A. Driscoll (1993). *Hydrodynamic 
Stability Without Eigenvalues," Science 261, 578-584. 


as well as Chaitin-Chatelin and Frayssé (1996, chapter 10). 


Chapter 12 
Special Topics 


812.1 Constrained Least Squares 

§12.2 Subset Selection Using the SVD 
5812.3 Total Least Squares 

§12.4 Computing Subspaces with the SVD 
§12.5 Updating Matrix Factorizations 
§12.6 Modified/Structured Eigenproblems 


In this final chapter we discuss an assortment of problems that repre- 
sent important applications of the singular value, QR, and Schur decompo- 
sitions. We first consider least squares minimization with constraints. T'wo 
types of constraints are considered in $12.1, quadratic inequality and linear 
equality. The next two sections are also concerned with variations on the 
standard LS problem. In 812.2 we consider how the vector of observations 
b might be approximated by some subset of A’s columns, a course of action 
that is sometimes appropriate if A is rank-deficient. In §12.3 we consider 
a variation of ordinary regression known as total least squares that has 
appeal when A is contaminated with error. More applications of the SVD 
are considered in 812.4, where various subspace calculations are considered. 
In 812.5 we investigate the updating of orthogonal factorizations when the 
matrix A undergoes a low-rank perturbation. Some variations of the basic 
eigenvalue problem are discussed in $12.6. 


Before You Begin 


Because of the topical nature of this chapter, it doesn't make sense to 
have a chapter-wide, before-you-begin advisory. Instead, each section will 
begin with pointers to earlier portions of the book, and, if appropriate, 
pointers to LAPACK and other texts. 
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12.1 Constrained Least Squares 


In the least squares setting it is sometimes natural to minimize || Az — 6 ||, 
over a proper subset of IR". For example, we may wish to predict b as best 
we can with Az subject to the constraint that x is a unit vector. Or, perhaps 
the solution defines a fitting function f(t) which is to have prescribed values 
at a finite number of points. This can lead to an equality constrained least 
squares problem. In this section we show how these problems can be solved 
using the QR factorization and the SVD. 

Chapter 5 and 88.7 should be understood before reading this section. 
LAPACK connections include: 


LAPACK: Tools for Generalized/Constrained LS Problems 


Solves the equality constrained LS problem 
Computes the generalized QR. factorization of a matrix pair 


Computes the generalized RQ factorization of & matrix pair 
Converts the GSVD problem to triangular form 
Computes the GSVD of a pair of triangular matrices 


Complementary references include Lawson and Hanson (1974) and Bjorck 
(1996). 


12.1.1 "The Problem LSQI 


Least squares minimization with & quadratic inequality constraint—the 
LSQI problem—is a technique that can be used whenever the solution to 
the ordinary LS problem needs to be regularized. A simple LSQI problem 
that arises when attempting to fit a function to noisy data is 


minimize | Ax —b|, ^ subject to | Br |], < o (12.1.1) 


where A € IR"*^, b c R™, B e IR?*" (nonsingular), and a > 0. The con- 
straint defines a hyperellipsoid in IR^ and is usually chosen to damp out 
excessive oscillation in the fitting function. This can be done, for example, 
if B is a discretized second derivative operator. 

More generally, we have the problem 


minimize | Az — 5| ^ subject to || Br — d|| <a (12.1.2) 
where Ac IR"*" (m > n), be R", Be RP", deR, and a 2 0. The 
generalized singular value decomposition of §8.7.3 sheds light on the solv- 
ability of (12.1.2). Indeed, if 


UT AX 


diag(a1,...,Qn) UTU =I,, 


(12.1.3) 
VTBX 


diag(f,, s ; Bg) VTV m Ip, q= min{p, n} 
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is the generalized singular value decomposition of A and B, then (12.1.2) 
transforms to 


minimize | Day — Blja subject to || Day —d |], € a 


where b = UT b, d= VT d, and y = X-1z. The simple form of the objective 
function 


n 


| Day- = Y (oy b) + SO 8 (12.1.4) 


t=] í—n4l 


and the constraint equation 


r P 
| Dsy- ål} = P (&yw-d)? + Y d? < o@ (12.1.5) 


i=l i-r4l 


facilitate the analysis of the LSQI problem. Here, r = rank(B) and we 
assume that 0,4); —-.: = 0, — 0. 
To begin with, the problem has a solution if and only if 


p 
5 d? < o?. 
t=r+1 


If we have equality in this expression then consideration of (12.1.4) and 
(12.1.5) shows that the vector defined by 


d:/ Bj t= Lr 
Yi = {4 b;/a; i=r+l:mn,œa; #0 (12.1.6) 
0 t=r+lin,a; =0 


solves the LSQI problem. Otherwise 


P 
M d? < a’. (12.1.7) 
t=r+1 


and we have more alternatives to pursue. The vector y € IR", defined by 


nm { b/a; a; #0 
f di/ ði a; =0 


is a minimizer of || Day — b lla. If this vector is also feasible, then we have 
a solution to (12.1.2). (This is not necessarily the solution of minimum 
2-norm, however.) We therefore assume that 


q 7 2 p 
ye (act -4) + So do. (12.1.8) 


i=q+1 


í = l:n 


i=] 


a:#0 
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This implies that the solution to the LSQI problem occurs on the boundary 
of the feasible set. Thus, our remaining goal is to 


minimize | Day — b|| ^ subject to || Day -dl — o. 
To solve this problem, we use the method of Lagrange multipliers. Defining 
h(à y) = || Day - 6113 +A (I Day - dli — o?) 
we see that the equations 0 = OAh/Oy; , i = 1:n, lead to the linear system 
(DT D4 + ADLDg)y = DIb + ADEd. 


Assuming that the matrix of coefficients is nonsingular, this has a solution 
y(A) where 


ab; +X idi 
=l: 
y. (A) “af + M : 
b;/ o t=q+ 1:n 


To determine the Lagrange parameter we define: 


: ~\2 
x - ibi — Qidi aa 
#0) = Day) -dt = Y (soc) + ye a 


1 2 2 
izl a; + AG; i=r+1 


and seek a solution to ¢(A) = a”. Equations of this type are referred to as 
secular equations and we encountered them earlier in §8.5.3. From (12.1.8) 
we see that (0) > a?. Now $(A) is monotone decreasing for \ > 0, and 
(12.1.8) therefore implies the existence of a unique positive A* for which 
$(A*) = o?. It is easy to show that this is the desired root. It can be 
found through the application of any standard root-finding technique, such 
as Newton's method. The solution of the original LSQI problem is then 
x = Xy(A*). 


12.1.2 LS Minimization Over a Sphere 


For the important case of minimization over a sphere (B = In, d = 0), we 
have the following procedure: 


Algorithm 12.1.1 Given A c IR"*" with m > n, b € IR", and a > 0, 
the following algorithm computes a vector z € IR^ such that || Az — b ||; is 
minimum, subject to the constraint that || z |; < a. 


Compute the SVD A= UXVT save V = [v,..., Un], and 
form b = UT b. 
r — rank(A) 
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T b, 2 
if (=) » o 
i=1 Ti 


r 2 
Find A* such that Y ( dibi ) = a’. 
i=l 


02 + A* 
d a,b; 
z= (sos) 


end 


The SVD is the dominant computation in this algorithm. 


Example 12.1.1 The secular equation for the problem 


ilia- [3] 


8 \2 2 X? 
(t) + (ra) = 
ATA A+1 


For this problem we find A* = 4.57132 and x = [.93334 .35898]7 . 


min 
| x lla = 1 


is given by 


12.1.3 Ridge Regression 


The problem solved by Algorithm 12.1.1 is equivalent to the Lagrange mul- 
tiplier problem of determining A > 0 such that 


(ATA AD = ATb (12.1.9) 


and || x ||, = a. This equation is precisely the normal equation formulation 
for the ridge regression problem 


Lle- Lo] 


In the general ridge regression problem one has some criteria for selecting 
the ridge parameter A, e.g., || z(A) ||; = @ for some given a. We describe a 
A-selection procedure that is discussed in Golub, Heath, and Wahba (1979). 

Set D, = I — exer = diag(1,...,1,0,1,...,1) € IR"*"and let 1,(A) 
solve 


2 
min 
z 


= min |Az-5|2 +All z l2 . 
2 T 


min | De(Az — b) l +All z B. (12.1.10) 
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Thus, z&(A) is the solution to the ridge regression problem with the kth row 
of A and kth component of 6 deleted, i.e., the kth experiment is ignored. 
Now consider choosing A so as to minimize the cross-validation weighted 
square error C(A) defined by 


CQ) = ŽP welaf sk) - by)? 
k=1 


Here, wı,..-, Wm are non-negative weights and af is the kth row of A. 
Noting that | 


| Aze(A) — b|} = || De(Aze(A) — b) || (a£) — bu)? 


we see that (az z,(^) — by)" is the increase in the sum of squares result- 
ing when the kth row is “reinstated.” Minimizing C(A) is tantamount to 
choosing A such that the final model is not overly dependent on any one 
experiment. 

À more rigorous analysis can make this statement precise and also sug- 
gest a method for minimizing C(A). Assuming that A > 0, an algebraic 
manipulation shows that 


T As 
zy) = z0) + E LAO il, m (12.1.11) 


where z = (ATA + AI) lap and 2(A) = (ATA + AI)-1ATb. Applying 
—a[ to (12.1.11) and then adding b; to each side of the resulting equation 
gives 


eT (I — A(ATA + AI)-1 AT) 


_ T = C. 
be — ante) = TU AATA  AI)-147)eg 


(12.1.12) 


Noting that the residual r = (r1,..., Tm)! = b — Az(A) is given by the 
formula r = [I — A(AT A + AI). AT]b, we see that 


^m sl (rm) 


The quotient r,/(Orx/Ob,) may be regarded as an inverse measure of the 
“impact” of the kth observation b, on the model. When Or,,/ób; is small, 
this says that the error in the model's prediction of b, is somewhat inde- 
pendent of by. The tendency for this to be true is lessened by basing the 
model on the A* that minimizes C(A). 

The actual determination of A* is simplified cs. computing the SVD of 
A. Indeed, if UTAV = diag(o),...,0,) with o1 2 ... > o, and b—UTb, 
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then it can be shown from (12.1.12) that 


The minimization of this expression is discussed in Golub, Heath, and 
Wahba (1979). 


12.1.4 Equality Constrained Least Squares 


We conclude the section by considering the least squares problem with 
linear equality constraints: 


min || Ax — 5 ||; (12.1.13) 
Br=d 


Here Ac IR™*”, B e IRP*”, b e R”, deR, and rank(B) = p. We refer 
to (12.1.13) as the LSE problem. By setting a = 0 in (12.1.2) we see 
that the LSE problem is a special case of the LSQI problem. However, 
it is simpler to approach the LSE problem directly rather than through 
Lagrange multipliers. 

Assume for clarity that both A and B have full rank. Let 


ToT . R p 
ue H n-p 


be the QR factorization of BT and set 


AQ = jA A] Qr = H Nd 
p n-p 


It is clear that with these transformations (12.1.13) becomes 


min || Ayy+ Azz — b lo. 
RT y=d 


Thus, y is determined from the constraint equation RTy = d and the vector 
z is obtained by solving the unconstrained LS problem 


min || Agz~ (b — Ary) ll. 
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Combining the above, we see that x = Q | : | solves (12.1.13). 


Algorithm 12.1.2 Suppose A € IR"*^, B c IRP*", b e IR", and d € RP. 
If rank(A) = n and rank(B) = p, then the following algorithm minimizes 
| Az — b ||; subject to the constraint Br = d . 

BT QR ( QR factorization) 

Solve R(1:p, 1:p)Ty =d for y. 

A= AQ 

Find z so || A(:,p + 1:n)z — (b — A(:, 1:p)y) ||, is minimized. 

z= QU piy + Qs p  ln)z 
Note that this approach to the LSE problem involves two factorizations and 
a matrix multiplication. 


12.1.5 The Method of Weighting 


An interesting way to obtain an approximate solution to (12.1.13) is to 
solve the unconstrained LS problem 


xe e- Dua] 


for large A. The generalized singular value decomposition of 88.7.3 sheds 
light on the quality of the approximation. Let 
UT AX 
VI BX 


min 
z 


(12.1.14) 


2 


diag(01,...,a04) = Da e R™*" 
diag(A;,...,8,) = Dg c IR?*" 
be the GSVD of (A, B) and assume that both matrices have full rank for 


clarity. EU = [uy,...,um], V =[v1,...,vp], and X =[24,...,2, ], then 
it is easy to show that 


I 


Pold — ulb 
r= ne + ` UT (12.1.15) 
c3 i aet. Ol 
= i=p+1 
is the exact solution to (12.1.13), while 
p T 242,T n. p 
aqu; b+ A"B?v;d u: b 
zA) = X —ML— H SO ri (12.1.16) 
1=1 en +A Bi i=p+1 e 


solves (12.1.14). Since 


P Ci Oa be 
TREE RE S E OC (12.1.17) 
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it follows that z(A) —> z as A — oo. 

The appeal of this approach to the LSE problem is that no special sub- 
routines are required: an ordinary LS solver will do. However, for large 
values of À numerical problems can arise and it is necessary to take precau- 
tions. See Powell and Reid (1968) and Van Loan (1982a). 


hile- [E] 


has solution x = [.3407821 , .340782 n^. This can be approximated by solving 


Example 12.1.2 The problem 


1 2 7 

g 3 4 21 1 
às 5 6 | T2 | cubos 
1000 —1000 0 


2 
which has solution z = [.3407810 , .3407829]7. 


Problems 


P 12.1.1 (a) Show that if null(A) n null(B) x (0), then (12.1.2) cannot have a unique 
solution. (b) Give an example which shows that the converse is not true. (Hint: Atb 
feasible.) 


P12.1.2 Let po(t),..., pa (z) be given polynomials and (zo, yo)... , (Im, Ym) a given 
set of coordinate pairs with | Ti € [a,b]. It is desired to find a polynomial plt) = 
oreo 2kPk(T) such that 3 7^ (p(z) — yi)? is minimized subject to the constraint that 


[ ip” (z)] dz wh y (geo aps) tany < < < o? 
4=0 
where z; = a + ih and b =a + Nh. Show that this leads to an LSQI problem of the form 
(12.1.1). 


P12.1.3 Suppose Y = [yi,..., yx] € R™** has the property that 
YTY = diag(di,...,dg) dı >da>--->d >O. 


Show that if Y = QR is the QR factorization of Y, then R is diagonal with |r;;| = di. 
P12.1.4 (a) Show that if (AT A + Mz, = ATb, A > 0, and || z ||; = a, then z = 
(Az — b)/A solves the dual equations (AAT -FAI)z = —b with || ATz lla =a. (b) Show 
that if (AAT + AD)z = —b, || A7z |l; = a, then z = —AT z satisfies (AT A+ AT)z = ATb, 
lz |a = o. 

P12.1.5 Suppose A is the m-by-1 matrix of ones and let b € R™. Show that the 
cross-validation technique with unit weights prescribes an optimal A given by 


where ÈT = (b +---+bm)/m and s = KC — b)?/(m — 1). 


t=] 
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P12.1.6 Establish equations (12.1.15), (12.1.16), and (12.1.17). 
P12.1.7 Develop an SVD version of Algorithm 12.1.2 that can handle rank deficiency 


in A and B. 
_ | Ai 
m pa 


P12.1.8 Suppose 
where A; € R?*" is nonsingular and A2 € R?^-?)*X^, Show that 


Omin(A) > V 1+ Omin(Ag A4, !)? Omin(A1) . 


P12.1.9 Consider the problem 


min | Az — b jl; Aé R™™" be R”, B,C €R"*" 
zT Brz—f? 
a? Cr=7? 


Assume that B and C are positive definite and that Z € R”*” is a nonsingular matrix 
with the property that ZT BZ = diag(dj,...,An) and ZTCZ = In. Assume that 
Ay > ++ > An. (a) Show that the the set of feasible z is empty unless ån < gni < A1. 
(b) Using Z, show how the two constraint problem can be converted to a single constraint 
problem of the form 


min Il Ax —b lla 
ye Wy=8? =dn7? 
where W = diag(A1,...,Àn) — AnI. 


P12.1.10 Suppose p > m > n and that A € R™*" and B c R™*? Show how to 
compute orthogonal Q € R™*™ and orthogonal V € R?** so that 


ata=| 6 | QT BV =[0, S] 


where R € R?”*” and S € R™*™ are upper triangular. 
P12.1.11 Suppose r € R”, y € R”, and 6 > 0. Show how to solve the problem 


min | Ey — r lla 
E € R™*" 
| E ||. xs 


Repeat with “min” replaced by “max”. 


Notes and References for Sec. 12.1 


Roughly speaking, regularization is a technique for transforming a poorly conditioned 
problem into a stable one. Quadratically constrained least squares is an important ex- 
ample. See 


L. Eldén (1977). “Algorithms for the Regularization of Ill-Conditioned Least Squares 
Problems,” BIT 17, 134-45. 


References for cross-validation include 


G.H. Golub, M. Heath, and G. Wahba (1979). “Generalized Cross-Validation as a 
Method for Choosing a Good Ridge Parameter,” Technometrics 21, 215-23. 

L. Eldén (1985). “A Note on the Computation of the Generalized Cross-Validation 
Function for [ll-Conditioned Least Squares Problems,” BIT 24, 467-472. 
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The LSQl problem is discussed in 


G.E. Forsythe and G.H. Golub (1965). “On the Stationary Values of a Second-Degree 
Polynomial on the Unit Sphere,” SIAM J. App. Math. 14, 1050-68. 

L. Eldén (1980). “Perturbation Theory for the Least Squares Problem with Linear 
Equality Constraints,” SIAM J. Num. Anal. 17, 338-50. 

W. Gander (1981). “Least Squares with a Quadratic Constraint,” Numer. Math. 36, 
291—307. 

L. Eldén (1983). "A Weighted Pseudoinverse, Generalized Singular Values, and Con- 
Strained Least Squares Problems," BIT 22 , 487—502. 

G.W. Stewart (1984). "On the Asymptotic Behavior of Scaled Singular Value and QR 
Decompositions,” Math. Comp. 43, 483-490. 

G.H. Golub and U. von Matt (1991). "Quadratically Constrained Least Squares and 
Quadratic Problems,” Numer. Math. 59, 561-580. 

T.F. Chan, J.A. Olkin, and D. Cooley (1992). “Solving Quadratically Constrained Least 
Squares Using Black Box Solvers,” BIT 32, 481—495. 


Other computational aspects of the LSQI problem involve updating and the handling of 
banded and sparse problems. See 


K. Schittkowski and J. Stoer (1979). “A Factorization Method for the Solution of Con- 
strained Linear Least Squares Problems Allowing for Subsequent Data changes,” 
Numer. Math. 31, 431-463. 

D.P. O?Leary and J.A. Simmons (1981). “A Bidiagonalization-Regularization Procedure 
for Large Scale Discretizations of Ill-Posed Problems,” STAM J. Sci. and Stat. Comp. 
2, 474—489. 

À. Bjórck (1984). *A General Updating Algorithm for Constrained Linear Least Squares 
Problems," SIAM J. Sci. and Stat. Comp. 5, 394—402. 

L. Eldén (1984). “An Algorithm for the Regularization of lll-Conditioned, Banded Least 
Squares Problems,” SIAM J. Sci. and Stat. Comp. 5, 231—254. 


Various aspects of the LSE problem are discussed and analyzed in 


M.J.D. Powell and J.K. Reid (1968). *On Applying Householder's Method to Linear 
Least Squares Problems,” Proc. [FIP Congress, pp. 122-26. 

C. Van Loan (1985). “On the Method of Weighting for Equality Constrained Least 
Squares Problems,” SIAM J. Numer. Anal. 22, 851—864. 

J.L. Barlow, N.K. Nichols, and R.J. Plemmons (1988). “Iterative Methods for Equality 
Constrained Least Squares Problems,” SIAM J. Sci. and Stat. Comp. 9, 892-906. 
J.L. Barlow (1988). “Error Analysis and Implementation Aspects of Deferred Correction 
for Equality Constrained Least-Squares Problems,” SIAM J. Num. Anal. 25, 1340- 

1358. 

J.L. Barlow and S.L. Handy (1988). "The Direct Solution of Weighted and Equality 
Constrained Least-Squares Problems,” SIAM J. Sci. Stat. Comp. 9, 704-716. 

J.L. Barlow and U.B. Vemulapati (1992). “A Note on Deferred Correction for Equality 
Constrained Least Squares Problems,” SIAM J. Num. Anal. 29, 249-256. 

M. Wei (1992). “Perturbation Theory for the Rank-Deficient Equality Constrained Least 
Squares Problem," S7AM J. Num. Anal. 29, 1462-1481. 

M. Wei (1992). “Algebraic Properties of the Rank-Deficient Equality-Constrained and 
Weighted Least Squares Problems," Lin. Alg. and Its Applic. 161, 27-44. 

M. Gullikeson and P-À. Wedin (1992). “Modifying the QR-Decomposition to Con- 
strained and Weighted Linear Least Squares,” SIAM J. Matriz Anal Appl. 13, 
1298-1313. 

À. Bjórck and C.C. Paige (1994). "Solution of Augmented Linear Systems Using Or- 
thogonal Factorizations,” BIT 34, 1—24. 
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M. Gulliksson (1995). “Backward Error Analysis for the Constrained and Weighted 
Linear Least Squares Problem When Using the Weighted QR Factorization,” SIAM 
J. Matrix. Anal. Appl. 18, 675-687. 


Generalized factorizations have an important bearing on generalized least squares prob- 
lems. 


C.C. Paige (1985). "The General Linear Model and the Generalized Singular Value 
Decomposition,” Lin. Alg. and Its Applic. 70, 269-284. 

C.C. Paige (1990). “Some Aspects of Generalized QR Factorization,” in Reliable Nu- 
merical Computations, M. Cox and S. Hammarling (eds), Clarendon Press, Oxford. 

E. Anderson, Z. Bai, and J. Dongarra (1992). “Generalized QR Factorization and Its 
Applications," Lin. Alg. and Its Applic. 162/163/164, 243-271. 


12.2  Subset Selection Using the SVD 


As described in $5.5, the rank-deficient LS problem min || Az — 6 || can be 
approached by approximating the minimum norm solution 


Yul 
ILS = » ET a r — rank( A) 
i21 ' 


with 


T 
ulb z 
IR = 5 2 Vi rır 
i=1 * 
where 


A = UVT = AAH (12.2.1) 
i=1 


is the SVD of A and ř is some numerically determined estimate of r. Note 
that z; minimizes || Azz — b || where 


F 
Az = ) ciuvl 
íz1 


is the closest matrix to A that has rank f. See Theorem 2.5.3. 

Replacing A by A; in the LS problem amounts to filtering the small 
singular values and can make a great deal of sense in those situations where 
A is derived from noisy data. In other applications, however, rank deficiency 
implies redundancy among the factors that comprise the underlying model. 
In this case, the model-builder may not be interested in a predictor such 
as Arr; that involves all n redundant factors. Instead, a predictor Ay may 
be sought where y has at most f nonzero components. The position of the 
nonzero entries determines which columns of A, i.e., which factors in the 
model, are to be used in approximating the observation vector b. How to 
pick these columns is the problem of subset selection and is the subject of 
this section. 

The contents of this section depends heavily upon 82.6 and Chapter 5. 
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12.2.1 QR with Column Pivoting 


QR with column pivoting can be regarded as a method for selecting an 
independent subset of A's columns from which 6 might be predicted. Sup- 
pose we apply Algorithm 5.4.1 to A € IR"*" and compute an orthogonal 
Q and a permutation II such that R = Q7 AII is upper triangular. If 
R(1:, 1:7)z = b(1:7) where b = QT b and we set 


then Ay is an approximate LS predictor of b that involves the first ? columns 
of AIT . 


12.2.2 Using the SVD 


Although QR with column pivoting is a fairly reliable way to handle near 
rank deficiency, the SVD is sometimes preferable for reasons discussed in 
$5.5. We therefore describe an SVD-based subset selection procedure due 
to Golub, Klema, and Stewart (1976) that proceeds as follows: 


e Compute the SVD A = UXVT and use it to determine a rank estimate 
f. 


e Calculate a permutation matrix P such that the columns of the matrix 
B, € R™*" in AP = | B1 B5] are “sufficiently independent." 


e Predict b with the vector Áy where y — P | g | and z € IR? minimizes 
|| Biz — b [lo 


The second step is key. Since 


min || Biz—bllg = || Ay—bllg > min | Ar — b |l2 
€ R* TER 


it can be argued that the permutation P should be chosen to make the 
residual (J — B, By )b as small as possible. Unfortunately, such a solution 
procedure can be unstable. For example, if 


1 1 0 1 
ÁA = 1 1+e 1 ‘ b = e] ; 
0 0 1 0 


f = 2, and P = I, then min | Biz — b ||2 = 0, but || Btb ||; = O(1/e). 
On the other hand, any proper subset involving the third column of A is 
strongly independent but renders a much worse residual. 
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This example shows that there can be a trade-off between the indepen- 
dence of the chosen columns and the norm of the residual that they render. 
How to proceed in the face of this trade-off requires additional mathemati- 
cal machinery in the form of useful bounds on o;(B,), the smallest singular 
value of B 1: 


Theorem 12.2.1 Let the SVD of A € IR?*" be given by (12.2.1), and 
define the matriz Bı € IR^**, F < rank(A), by 


AP -|B, B] 


T nr 


where P € IR"™“" is a permutation. If 
n-—f (12.2.2) 


and V, is nonsingular, then 
SCR S < o;(Bi) € o%(A). 
| Vn llo 
Proof. The upper bound follows from the minimax characterization of 


singular values given in 58.6.1 
To establish the lower bound, partition the diagonal matrix of singular 


values as follows: 
B »54 O0 f 
2s | 0 Do | m—?T 


r ne-r 


If w € RY is a unit vector with the property that || Byw ||; = az(B1), then 


2 
o7(B,)? 


| Biw l? = zv? | á | 


2 
| SaViw 2 + |Y2Viwld - 


The theorem now follows because || Di Vqw ||; > o«(A)/|| Viz! |l. 2 


This result suggests that in the interest of obtaining a sufficiently indepen- 
dent subset of columns, we choose the permutation P such that the result- 
ing Vi; submatrix is as well-conditioned as possible. A heuristic solution to 
this problem can be obtained by computing the QR with column-pivoting 
factorization of the matrix [V,4 VA |, where 
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is a partitioning of the matrix V in (12.2.1). In particular, if we apply QR 
with column pivoting (Algorithm 5.4.1) to compute 


QT [Vi VÀ ]P = | Ru Ry2 | 


T nr 


where Q is orthogonal, P is a permutation matrix, and E; is upper trian- 
gular, then (12.2.2) implies: 


E | zo | Yat T RQ" | 
Vii Va RDQ 
Note that R1; is nonsingular and that || VT’ ||, = || Riz ||;. Heuristically, 


column pivoting tends to produce a well-conditioned R11, and so the overall 
process tends to produce a well-conditoned V11. Thus we obtain 


Algorithm 12.2.1 Given A c IR""" and b @ R” the following algo- 
rithm computes a permutation P, a rank estimate 7, and a vector z c IR 
such that the first f columns of B = AP are independent and such that 
|| B(:; 1:7)z — b ||; is minimized. 


Compute the SVD UTAV = diag(o,...,04) and save V. 

Determine 7 € rank(A) . 

Apply QR with column pivoting: Q'V(:, 1:7)? P = | R41 Rig] and set 
AP = |B; Bz] with B, c IR"** and B; c R'"*(^75, 

Determine z € R? such that | P — Bız ||, = min. 


Example 12.2.1 Let 


3 4 1000 1 
E 7 4 -3.0002 o Ji 
A= 2 5 2999 || 8 = fa [> 

-1 4 5.0003 1 


A is close to being rank 2 in the sense that o3(A) œ .0001. Setting F = 2 in Algorithm 
12.2.1 leads to z = [0 0.2360 — 0.0085]7 with || Az — ||, = .1966. (The permutation 
P is given by P = [e3 e2 ei]) Note that zrs = [828.1056 — 827.8569 828.0536] 
with minimum residual || Aris — 6 ||; = 0.0343. 


12.2.3 More on Column Independence vs. Residual 


We return to the discussion of the trade-off between column independence 
and norm of the residual. In particular, to assess the above method of 
subset selection, we need to examine the residual of the vector y that it 
produces r, = b-— Åy = b- Bız = (1 — B B? )b. Here, B1 = B(:, 1:7) with 
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B = AP. To this end, it is appropriate to compare ry with rz, = b— Azz 
since we are regarding A as a rank-r matrix and since z; solves the nearest 
rank-F LS problem, namely, min || Azz — b ||». 


Theorem 12.2.2 Ifr, andr,, are defined as above and if Vii is the leading 
r-by-r principal submatrix of PTV , then 


OF A ts 
Ir -ry la s RD P Hall lz 


Proof. Note that rs, = (I — U1UT)b and ry = (I - Q1QT)b where 


UZ[U Uz | 
T m-f 


is a partitioning of the matrix U in (12.2.1) and where Q1 = B1(BI Bj) !/7. 
Using Theorem 2.6.1 we obtain 


| Tr — Ty ll < | UUT — QiQT ll | b lla sd | Us Qi |l I b llo 


while Theorem 12.2.1 permits us to conclude that 


1 
|UZQil; < UFB: lal (BTB) 7 |l € esi(4)— 44 
a741(A) ) 5-1 
—_— || V x5 
«OUS I I 
Noting that 
r 
lras —Ty lp = [By — Y Tui 
t=1 2 


we see that Theorem 12.2.2 sheds light on how well Bıy can predict the 
“stable” component of b, i.e., UTb. Any attempt to approximate UJ} 
can lead to a large norm solution. Moreover, the theorem says that if 
O741(A) < o;(A), then any reasonably independent subset of columns 
produces essentially the same-sized residual. On the other hand, if there 
is no well-defined gap in the singular values, then the determination of 7 
becomes difficult and the entire subset selection problem more complicated. 


Problems 
P12.2. Suppose A € R™X” and that [|u72A ||, = c with uTu = 1. Show that if 
uT (Az — b) = 0 for x € R” and b € R”, then || z lj; > |u7|/o. 


P12.2.2 Show that if B, € R™** is comprised of k columns from A € R™*” then 
7x (B1) < ex(A). 


P12.2.3 In equation (12.2.2) we know that the matrix 
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is orthogonal. Thus, || Vi? lla = || Vzq° ||, from the CS decomposition (Theorem 2.6.3). 
Show how to compute P by applying the QR with column pivoting algorithm to [VZ VZ]. 
(For f > n/2, this procedure would be more economical than the technique discussed in 
the text.) Incorporate this observation in Algorithm 12.2.1. 


Notes and References for Sec. 12.2 


The material in this section is derived from 


G.H. Golub, V. Klema and G.W. Stewart (1976). "Rank Degeneracy and Least Squares 
Problems," Technical Report TR-456, Department of Computer Science, University 
of Maryland, College Park, MD. 


A subset selection procedure based upon the total least squares fitting technique of $12.3 
is given in 


S. Van Huffel and J. Vandewalle (1987). "Subset Selection Using the Total Least Squares 
Approach in Collinearity Problems with Errors in the Variables,” Lin. Alg. and Its 
Applic. 88/89, 695—714. 


The literature on subset selection is vast and we refer the reader to 


H. Hotelling (1957). *The Relations of the Newer Multivariate Statistical Methods to 
Factor Analysis,” Brit. J. Stat. Psych. 10, 69—79. 


12.3 Total Least Squares 


The problem of minimizing || D(Az — b) |], where A € IR"*", and D = 
diag(d,,...,dm) is nonsingular can be recast as follows: 


min || Dr], rem". (12.3.1) 
b+r € range(A) 


In this problem, there is a tacit assumption that the errors are confined to 
the "observation" b. When error is also present in the “data” A, then it 
may be more natural to consider the problem 


min |D[E,r]T|. BER’, reR™ (1232) 
b+r € range( A4- E) 


where D = diag(di,...,d4,) and T = diag(ti,...,tn41) are nonsingular. 
This problem, discussed in Golub and Van Loan (1980), is referred to as 
the total least squares (TLS) problem. 

If a minimizing [ Eo , ro] can be found for (12.3.2), then any x satisfying 
(A+ Eg)z = b + ro is called a TLS solution. However, it should be realized 
that (12.3.2) may fail to have a solution altogether. For example, if 


1 0 1 0 0 


á=|0 0|,b=|1|, D=, T= ann E= |0 € 
0 0 1 0 c 
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then for all e > 0, b € ran(A + Ec). However, there is no smallest value of 
| [2,7] Ip for which 6+r € ran(A+ E). 

A generalization of (12.3.2) results if we allow multiple right-hand sides. 
In particular, if B € R™**, then we have the problem 


min I DIE, RIT |, (12.3.3) 
range( B+R) C range( A- E) 


where E € R”*” and R € IR?** and the matrices D = diag(d,...,dm) 
and T = diag(ti,...,tn4*) are nonsingular. If [ Eo, Ro} solves (12.3.3), 
then any X € IR^** that satisfies (A + Eo)X = (B + Ro) is said to be a 
TLS solution to (12.3.3). 

In this section we discuss some of the mathematical properties of the 
total least squares problem and show how it can be solved using the SVD. 
Chapter 5 is the only prerequisite. A very detailed treatment of the TLS 
problem is given in the monograph by Van Huffel and Vanderwalle (1991). 


12.3.1 Mathematical Background 


The following theorem gives conditions for the uniqueness and existence of 
a TLS solution to the multiple right-hand side problem. 


Theorem 12.3.1 Let A, B, D, and T be as above and assume m > n+k. 
Let 
C= DIA, B|T - [C1 C; 
n k 


have SVD UTCV = diag(o1,...,0444) = E where U, V, and X. are parti- 
tioned as follows: 


Vi Vaj n 
UZ[U U V = 
| n jal Bs E k 
n k 
EN Di 0 7 
P= | zl 
n k 
If o4 (C1) > on+1(C), then the matriz | Eo , Ro} defined by 
D| Eo, Ry]|T = —U2E2[ VĒ , VÀ, ] (12.3.4) 


solves (12.3.3). If T, = diag(ti,...,t4) and Tz = diag(tn+1,---,tn+k) then 
the matriz 
XTLs = -TV2 Va T; | 


exists and is the unique solution to (A+ Eg)X = B+ Ro. 
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Proof. We first establish two results that follow from the assumption 
On(C1) > On41(C). From the equation CV = UE we have C,Vj2+ C3V25. = 
U2X2. We wish to show that V22 is nonsingular. Suppose V22z = 0 for some 
unit 2-norm z. It follows from V,5Vi2 + V, Vo» = I that || Vier ||; = 1. But 
then 

9544(C) > || U2EZ2x lle = || CiVizx ll > o4(C1); 
a contradiction. Thus, the submatix V22 is nonsingular. 

The other fact that follows from c4,(C1) > on41(C) concerns the strict 
separation of o,,(C) and o441(C). From Corollary 8.3.3, we have o,(C) > 
On(C1) and so on(C) > on(Ci) > Ong (C) . 

Now we are set to prove the theorem. If ran(B + R) C ran(A + E), 
then there is an X (n-by-k) so (A + E)X = B+ R, ie., 

(DIA, BIT + DLE, RIIT | 7 | = 0. (12.3.5) 
Thus, the matrix in curly brackets has, at most, rank n. By following the 
argument in Theorem 2.5.3, it can be shown that 


n+k 
| D[E, RITE > $ aC? 


i-n4l 


and that the lower bound is realized by setting [ E, R] = [ Eo, Ro]. The 
inequality o,,(C) > on41(C) ensures that [Eo , Ro] is the unique minimizer. 
The null space of 


(D[A, B]T + D[ Eo, Ro]T) = UiYi[ Vg Va] 


is the range of | M | . Thus, from (12.3.5) 


Va 
X V 
T"! z 12 5 
[J Ds] 
for some k-by-k matrix S. From the equations Tj ! X = Vi25 and -T3 ' = 
V225 we see that S = -V> T; ' . Thus, we must have 


X = TV28 = -T VV; T;! = Xrrs 0 


If on(C) = on41(C), then the TLS problem may still have a solution, 
although it may not be unique. In this case, it may be desirable to single 
out a “minimal norm” solution. To this end, consider the 7-norm defined 
on IR"** by || Z ||, = || Ty} Z7; |. If X is given by (12.3.5), then from the 
CS decomposition (Theorem 2.6.3) we have 


l- ok Va)? 
ox (V22)? 
This suggests choosing V in Theorem 12.3.1 so that o,(V22) is maximized. 


2 = 
IX Ip = | Va2¥eo" l2 = 
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12.3.2 Computations for the k=1 Case 


We show how to maximize Vo2 in the important k = 1 case. Suppose 
the singular values of C satisfy 94 4 > On-p41 = t = 905441 and let 
V —[v...,Un41] be a column partitioning of V. If Q is a Householder 
matrix such that 


~ W 
Vi,n+1l—pin+1)Q = | 0 E h 
p 1 


then | i | has the largest (n + 1)-st component of all the vectors in 
span(v, 11-p,...,Un41) - Ifa = 0, the TLS problem has no solution. Oth- 


erwise zrs = —Tiz/(ts410). Moreover, 
In-1 0 T In-p 0 = 
| A 9 urola eim | o 6|-? 
and so 


D| Eo, ro]T = -Dj 4, b]r | : | [27 a]. 


Overall, we have the following algorithm: 


Algorithm 12.3.1 Given A € IR"*" (m > n), b € IR", and nonsingular 
D = diag(d;,...,d;4) and T = diag(t1,...,tn41), the following algorithm 
computes (if possible) a vector rrrs € R” such that (A+ Eg)z = (b+ ro) 
and || D( Eo, ro]T || is minimal. 
Compute the SVD U7(D[ A, b]T)V = diag(o1,...,0n41). Save V. 
Determine p such that o4 > -++ > 04.5 > 94 g417— 7: = On41- 
Compute a Householder matrix P such that if V = VP, then 
V(n-1,n-p41m)-0 


if Un+in41 £ 0 
for i = l:n 
Li = —titingi/(tn410n41,n41) 
end 
end 


This algorithm requires about 2mn? + 12n? flops and most of these are 
associated with the SVD computation. 


Example 12.3.1 The TLS problem min Ille, r] ll» where a = [1, 2, 3, 4]7 and 
(ate)z=b+r 


b = (2.01, 3.99, 5.80, 8.30]7 has solution zr rs = 2.0212, e = [—.0045, —.0209, —.1048, .0855]7, 
and r = [.0022, .0103, .0519, —.0423|T. Note that for this data rps = 2.0197. 
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12.3.3 Geometric Interpretation 


It can be shown that the TLS solution rzs minimizes 


w(x) = yo 


il? 


where a7 is ith row of A and b; is the ith component of b. A geometrical 
interpretation of the TLS problem is made possible by this observation. 


Indeed 
aT x — b;|? 


zTT, rij 
is the square of the distance from | 2 | c R**! to the nearest point in 


bi 
the subspace 


a 


P= {| al :a@€R’, be R, b=a7al 


where distance in IR^*! is measured by the norm || z || = || Tz |. A great 
deal has been written about this kind of fitting. See Pearson (1901) and 
Madansky (1959). 


Problems 


P12.3.1 Consider the TLS problem (12.3.2) with nonsingular D and T. (a) Show that 
if rank(A) < n, then (12.3.2) has a solution if and only if b € ran(A). (b) Show that if 
rank(A) = n, then (12.3.2) has no solution if AT D*b = 0 and |tn4:||| Dè || > e (DAT1) 
where T, = diag(ti,...,tn).- 


P12.3.2 Show that if C = D| A, &]T = [ A1 , d] and on(C) > on41(C), then the TLS 
solution z satisfies (AT A1 — on41(C)7J)z = Ald. 


P12.3.3 Show how to solve (12.3.2) with the added constraint that the first p columns 
of the minimizing E are zero. 


Notes and References for Sec. 12.3 


This section is based upon 


G.H. Golub and C.F. Van Loan (1980). *An Analysis of the Total Least Squares Prob- 
lem," STAM J. Num. Anal. 17, 883-93. 


The bearing of the SVD on the TLS problem is set forth in 


G.H. Golub and C. Reinsch (1970). *Singular Value Decomposition and Least Squares 
Solutions," Numer. Math. 14, 403-420. 


600 CHAPTER 12. SPECIAL TOPICS 


G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems,” SIAM Review 15, 
318—334. 


The most detailed study of the TLS problem is 


S. Van Huffel and J. Vandewalle (1991). The Total Least Squares Problem: Computa- 
tional Aspects and Analysis, SIAM Publications, Philadelphia. 


If some of the columns of A are known exactly then it is sensible to force the TLS per- 
turbation matrix E to be zero in the same columns. Aspects of thís constrained 'TLS 
problem are discussed in 


J.W. Demmel (1987). "The Smallest Perturbation of a Submatrix which Lowers the 
Rank and Constrained Total Least Squares Problems, SIAM J. Numer. Anal. 24, 
199-206. 

S. Van Huffel and J. Vendewalle (1988). "The Partial Total Least Squares Algorithm," 
J. Comp. and App. Math. 21, 333-342. 

S. Van Huffel and J. Vandewalle (1988). "Analysis and Solution of the Nongeneric Total 
Least Squares Problem,” SIAM J. Matriz Anal. Appl. 9, 360-372. 

S. Van Huffel and J. Vandewalle (1989). “Analysis and Properties of the Generalized 
Total Least Squares Problem AX £: B When Some or All Columns in A are Subject 
to Error,” SIAM J. Matriz Anal. Appl. 10, 294-315. 

S. Van Huffel and H. Zha (1991). "The Restricted Total Least Squares Problem: For- 
mulation, Algorithm, and Properties,” SIAM J. Matriz Anal. Appl. 12, 292-309. 

S. Van Huffel (1992). “On the Significance of Nongeneric Total Least Squares Problems,” 
SIAM J. Matriz Anal. Appl. 13, 20-35. 

M. Wei (1992). “The Analysis for the Total Least Squares Problem with More than One 
Solution,” SIAM J. Matriz Anal. Appl. 13, 746—763. 

S. Van Huffel and H. Zha (1993). *An Efficient Total Least Squares Algorithm Based 
On a Rank-Revealing Two-Sided Orthogonal Decomposition,” Numerical Algorithms 
4, 101-133. 

C.C. Paige and M. Wei (1993). “Analysis of the Generalized Total Least Squares Problem 
AX = B when Some of the Columns are Free of Error,” Numer. Math. 65, 171-202. 

R.D. Fierro and J.R. Bunch (1994). "Collinearity and Total Least Squares," SIAM J. 
Matriz Anal. Appl. 15, 1167-1181. 


Other references concerned with least squares fitting when there are errors in the data 
matrix include 


K. Pearson (1901). *On Lines and Planes of Closest Fit to Points in Space," Phil. Mag. 
2, 559-72. 

A. Wald (1940). "The Fitting of Straight Lines if Both Variables are Subject to Error," 
Annals of Mathematical Statistics 11, 284—300. 

A. Madansky (1959). “The Fitting of Straight Lines When Both Variables Are Subject 
to Error,” J. Amer. Stat. Assoc. 54, 173-205. 

I. Linnik (1961). Method of Least Squares and Principles of the Theory of Observations, 
Pergamon Press, New York. 

W.G. Cochrane (1968). “Errors of Measurement in Statistics," Technometrics 10, 637— 
66. 

R.F. Gunst, J.T. Webster, and R.L. Mason (1976). “A Comparison of Least Squares 
and Latent Root Regression Estimators,” Technometrics 18, 75—83. 

G.W. Stewart (1977c). “Sensitivity Coefficients for the Effects of Errors in the Inde 
pendent Variables in a Linear Regression," Technical Report TR-571, Department of 
Computer Science, University of Maryland, College Park, MD. 

A. Van der Sluis and G.W. Veltkamp (1979). “Restoring Rank and Consistency by 
Orthogonal Projection," Lin. Alg. and Its Applic. 28, 257-78. 
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12.4 Computing Subspaces with the SVD 


It is sometimes necessary to investigate the relationship between two given 
subspaces. How close are they? Do they intersect? Can one be “rotated” 
into the other? And so on. In this section we show how questions like 
these can be answered using the singular value decomposition. Knowledge 
of Chapter 5 and §8.6 are assumed. 


12.4.1 Rotation of Subspaces 


Suppose A c IR"? is a data matrix obtained by performing a certain set 
of experiments. If the same set of experiments is performed again, then a 
different data matrix, B € IR"*?, is obtained. In the orthogonal Procrustes 
problem the possibility that B can be rotated into A is explored by solving 
the following problem: 


minimize || A — BQ ||p ^ subject to QTQ = Ip. (12.4.1) 
Recall that the trace of a matrix is the sum of its diagonal entries and thus, 
uctor mc Iz. It follows that if Q € IRP*? is orthogonal, then 
|| A — BQ ||} = tr(AT A) + tr(BT B) - 2 tr(Q? BT A). 


Thus, (12.4.1) is equivalent to the problem of maximizing tr(Q7 B? A). 

The maximizing Q can be found by calculating the SVD of BTA. In- 
deed, if U7(BTA)V = X = diag(oi,...,05) is the SVD of this matrix 
and we define the orthogonal matrix Z by Z = VT QTU, then 


D p 
tr(Q7 B7 A) = t(QTUEVT) = tr(ZZ) = V zuoi € V o; 
i—1 i=1 
Clearly, the upper bound is attained by setting Q = UVT for then Z = ER 
This gives the following algorithm: 


Algorithm 12.4.1 Given A and B in IR"*?, the following algorithm finds 
an orthogonal Q € IRP*? such that || A — BQ ||p is minimum. 


C — BT A 
Compute the SVD UT CV = X. Save U and V. 
Q-UVT. 


The solution matrix Q is the orthogonal polar factor of BTA. See 64.2.10. 


Example 12.4.1 


1 2 12 21 

_ [.9999 —.0126 us 3 4 29 43 
dm | 0126  .9999 | none | 5 6 | Q- | 52 61 | | 
7 8 : 
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12.4.2 Intersection of Null Spaces 


Let A € IR"*" and B € IRP*" be given, and consider the problem of finding 
an orthonormal basis for null( A) N null(B). One approach is to compute 
the null space of the matrix 
A 
sat 


since Cz = 0 «€» x € null(A) n null(B). However, a more economical 
procedure results if we exploit the following theorem. 


Theorem 12.4.1 Suppose A € IR"*" and let (z1,..., z} be an orthonor- 
mal basis for null( A). Define Z = [zu...,z:] and let (wi,..., Wq} be an 
orthonormal basis for null(BZ) where B € IRP^^. If W = [wi,...,u4], 
then the columns of ZW form an orthonormal basis for null(A) N null( B). 


Proof. Since AZ = 0 and (BZ)W = 0, we clearly have ran(ZW) C 
null( A) null( B). Now suppose = is in both null( A) and null(B). It follows 
that x = Za for some 0 a € IR. But since 0 = Bz = B Za, we must have 
a = Wb for some b € R’. Thus, x = ZWb € ran( ZW). O 


When the SVD is used to compute the orthonormal bases in this theorem 
we obtain the following procedure: 


Algorithm 12.4.2 Given A c IR"*" and B c IRP*", the following al- 
gorithm computes and integer s and a matrix Y = [y;,...,9,] having 
orthonormal columns which span null(A) N null(B). If the intersection is 
trivial then s = 0. 


Compute the SVD U? AV, = diag(co;). Save V4 and set 
r — rank(A). 
ifr«n 
C = BVA(C,r + 1:n) 
Compute the SVD UZCVc = diag(7i). Save Vc and set 
q — rank(C). 
ifq«n-r 
$—n-r-q 
Y = VA(, r + 1:n)Vc(: q + Llin — r) 
else 


else 


end 
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The amount of work required by this algorithm depends upon the relative 
sizes of m, n, p, and r. 

We mention that a practical implementation of this algorithm requires 
a means for deciding when a computed singular value ó; is negligible. The 
use of a tolerance 6 for this purpose (e.g. ĝi < 6 > ĝi = 0) implies that 
the columns of the computed Y “almost” define a common null space of A 
and B in the sense that || AY ||; ~ || BY || = 6 


Example 12.4.2 If 


1 -1 1 4 2 0 
A= 1 -1 1 and B — 2 1 0 
1 -1 1 6 3 0 


then null(A) N null(B) = span{z}, where x —[1 —2 —3|^. Applying Algorithm 12.4.2 
we find 


—.8165  .0000 sar .2673 1 
VoAVac = | —.4082 7071 pz meri | æ | -—545 | «.26733| -2 |. 
4082 7071 —.8018 -3 
12.4.3 Angles Between Subspaces 
Let F and G be subspaces in IR" whose dimensions satisfy 
= dim(F) > dim(G) = q > 1. 


The principal angles 0;,...,0, € [0,7/2] between F and G are defined 
recursively by 


cos(Ü) = max max uly = uluk 
u€F vcG 
subject to: 
lim |v || 21 
u u; =0 i=l: -1 
Ty; = 0 P= 1:k -1. 


Note that the eden angles satisfy 0 < 0; <+-- < 0, < 7/2. The vectors 
(u1,..., ug} and (vi,..., Vq} are called the principal vectors between the 
subspaces F and G. 

Principal angles and vectors arise in many important statistical appli- 
cations. The largest principal angle is related to the notion of distance 
between equidimensional subspaces that we discussed in 82.6.3 If p — q 
then dist(F,G) = ,/1—cos(6,)? = sin(6,). 

If the columns of Qp € IR™*? and Qg c IR”! define orthonormal bases 
for F and G respectively, then 


max max uv _ max max y’(QEQc)z 
q 


ucF veG y cR? zceR 
lvl2—-1  Hell2=1 Wl l|zia-1 
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From the minimax characterization of singular values given in Theorem 
8.6.1 it follows that if YT(QLQG)Z = diag(oi,...,0,) is the SVD of 
QT.Qc, then we may define the uz, vk, and 0, by 


EUNT = QrY 
[ 91... Vg] QcGZ 
cos(ĝk) = ak k — lg 


Typically, the spaces F and G are defined as the ranges of given matrices 
A € IR?*? and B € IR"*?, In this case the desired orthonormal bases can 
be obtained by computing the QR factorizations of these two matrices. 


Algorithm 12.4.3 Given A € R”*? and B € IR?*4 (p > q) each with lin- 
early independent columns, the following algorithm computes the orthogo- 
nal matrices U = [uj,...,u;] and V = [vi,..., v, | and cos(01),.. .cos(8,) 
such that the 8, are the principal angles between ran(A) and ran(B) and 
uy and vy are the associated principal vectors. 


Use Algorithm 5.2.1 to compute the QR factorizations 


A=QaRa Q1Qa=1,, Ra €R*? 
B=Qphp QlQg-I, Re € Ri”? 
C=Q19B 


Compute the SVD YT CZ = diag(cos(9)). 
QaY(:, 1:g) = [ui, e -s Ug | 
QpZ = [915269304] 


This algorithm requires about 4m(q? + 2p?) + 2pg(m + q) + 129? flops. 

The idea of using the SVD to compute the principal angles and vectors 
is due to Björck and Golub (1973). The problem of rank deficiency in A 
and B is also treated in this paper. 


12.4.4 Intersection of Subspaces 


Algorithm 12.4.3 can also be used to compute an orthonormal basis for 
ran( A) à ran(B) where A € IR?*? and B € IR" *? 


Theorem 12.4.2 Let (cos(0,),u&,v,)] , be defined by Algorithm 12.4.3. 
If the index s is defined by 1 = cos(81) = --- = cos(@,) > cos(0,41), then 


we have 


ran(A)Mran(B) = span(uj,...,u,) = span(vi,..., Va}. 
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Proof. The proof follows from the observation that if cos(@,) = 1, then 
uk = VE. O 


With inexact arithmetic, it is necessary to compute the approximate mul- 
tiplicity of the unit cosines in Algorithm 12.4.3. 
Example 12.4.3 If 


1 5 
A= and B = 3 T 
b -1 


then the cosines of the principal angles between ran(A) and ran(B) are 1.000 and .856. 


or o ee 
Oo & Ww 


Problems 


P12.4.1 Show that if A and B are m-by-p matrices, with p < m, then 


p 
min |A- BQ} = Y (GA - 2007 A) + 04(B)?). 


T A 
Q Q-1, i=1 


P12.4.2 Extend Algorithm 12.4.2 so that it can compute an orthonormal basis for 
null( À1) N- -N null( Ag). 


P12.4.3 Extend Algorithm 12.4.3 to handle the case when A and B are rank deficient, 


P12.4.4 Relate the principal angles and vectors between ran(A) and ran(B) to the 
eigenvalues and eigenvectors of the generalized eigenvalue problem 


[ats ^v JU] L^ ote | fe] 


P12.4.5 Suppose A, B € R'"** and that A has full column rank. Show how to compute 
a symmetric matrix X € R”*”” that minimizes || AX — B ||p. Hint: Compute the SVD 
of A. 


Notes and References for Sec. 12.4 


The problem of minimizing {| A — BQ ||, over all orthogonal matrices arises in psycho- 
metrics, See 


B. Green (1952). “The Orthogonal Approximation of an Oblique Structure in Factor 
Analysis,” Psychometrika 17, 429-40. 

P. Schonemann (1966). “A Generalized Solution of the Orthogonal Procrustes Problem,” 
Psychometrika 31, 1-10. 

LY. Bar-Itzhack (1975). “Iterative Optimal Orthogonalization of the Strapdown Ma- 
trix," JEEE Trans. Aerospace and Electronic Systems 11, 30—37. 

R.J. Hanson and M.J. Norris (1981). “Analysis of Measurements Based on the Singular 
Value Decomposition,” SIAM J. Sci. and Stat. Comp. 2, 363-374. 

H. Park (1991). *A Parallel Algorithm for the Unbalanced Orthogonal Procrustes Prob- 
lem," Parallel Computing 17, 913—923. 
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When B = I, this problem amounts to finding the closest orthogonal matrix to A. This 
is equivalent to the polar decomposition problem of §4.2.10. See 


A. Björck and C. Bowie (1971). “An Iterative Algorithm for Computing the Best Esti- 
mate of an Orthogonal Matrix,” SIAM J. Num. Anal. 8, 358-64. 


N.J. Higham (1986). “Computing the Polar Decomposition—with Applications,” SIAM 
J. Sci. and Stat. Comp. 7, 1160-1174. 


If A is reasonably close to being orthogonal itself, then Bjorck and Bowie's technique is 
more efficient than the SVD algorithm. 


The problem of minimizing || AX — B [p subject to the constraint that X is sym- 
metric is studied in 


N.J. Higham (1988). "The Symmetric Procrustes Problem," BIT 28, 133-43. 
Using the SVD to solve the canonical correlation problem is discussed in 


A. Bjórck and G.H. Golub (1973). “Numerical Methods for Computing Angles Between 
Linear Subspaces," Math. Comp. 27, 579-94. 


G.H. Golub and H. Zha (1994). *Perturbation Analysis of the Canonical Correlations of 
Matrix Pairs,” Lin. Alg. and Its Applic. 210, 3-28. 


'The SVD has other roles to play in statistical computation. 


S.J. Hammarling (1985). “The Singular Value Decomposition in Multivariate Statistics," 
ACM SIGNUM Newsletter 20, 2-25. 


12.5 Updating Matrix Factorizations 


In many applications it is necessary to re-factor a given matrix A € IR"*^ 
after it has been altered in some minimal sense. For example, given that 
we have the QR factorization of A, we may need to calculate the QR fac- 
torization of a matrix A that is obtained by (a) adding a general rank-one 
matrix to A, (b) appending a row (or column) to A, or (c) deleting a row 
(or column) from A. In this section we show that in situations like these, it 
is much more efficient to “update” A’s QR factorization than to generate it 
from scratch. We also show how to update the null space of a matrix after 
it has been augmented with an additional row. 

Before beginning, we mention that there are also techniques for updat- 
ing the factorizations PA = LU, A = GGT, and A = LDL’. Updating 
these factorizations, however, can be quite delicate because of pivoting re- 
quirements and because when we tamper with a positive definite matrix the 
result may not be positive definite. See Gill, Golub, Murray, and Saunders 
(1974) and Stewart (1979). Along these lines we briefly discuss hyperbolic 
transformations and their use in the Cholesky downdating problem. 

Familiarity with §3.5, §4.1, §5.1, §5.2, §5.4, and §5.5 is required. Com- 
plementary reading includes Gill, Murray, and Wright (1991). 
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12.5.1 Rank-One Changes 


Suppose we have the QR factorization QR = B € IR"*" and that we need 
to compute the QR factorization B + uv? = Q,R, where u,v € R” are 
given. Observe that 

B+uv? = Q(R wv") (12.5.1) 


where w = QT v. Suppose that we compute rotations J5 .1,..., J2, Jı such 
that 
JT ...JT uw = l| wei. 


Here, each J; is a rotation in planes k and k-- 1. (For details, see Algorithm 
5.1.3.) If these same Givens rotations are applied to R, it can be shown 
that 


H-JT.JI QR (12.5.2) 
is upper Hessenberg. For example, in the n — 4 case we start with 
X X X X x 
0 x x x x 
fes 0 0 x x xd E" 
0 0 0 x x 
and then update as follows: 
X X X X x 
c WP e QO x x x o m | x 
R = J4 R = D ox 3x w= Jgw = : 
0 0 x x 0 
X X X X x 
O x x x x 
— HE >= ee e 
0 0 x x 0 
X X X X x 
_ Tp. |* X X X ER. CIS 0 
H-2J,R — Gr xc ox we w-Jw-l|g 
0 0 x x 0 
Consequently, 
(JT -J3 (R+ wv) = H £|wlloev? = Hi (12.5.3) 


is also upper Hessenberg. 

In Algorithm 5.2.3, we show how to compute the QR factorization of an 
upper Hessenberg matrix in O(n?) flops. In particular, we can find Givens 
rotations G, , k = 1:n — 1 such that 


GT_,---GTH, = R; (12.5.4) 
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is upper triangular. Combining (12.5.1) through (12.5.4) we obtain the QR 
factorization B + uv? = QR, where 


Qi = QJn-1°-: JG te Gs. 


A careful assessment of the work reveals that about 26n? flops are required. 
The vector w = QTu requires 2n? flops. Computing H and accumulating 
the J, into Q. involves 12n? flops. Finally, computing R; and multiplying 
the G, into Q involves 12n? flops. 

The technique readily extends to the case when B is rectangular. It can 
also be generalized to compute the QR factorization of B + UV? where 
rank(UV?) = p> 1. 


12.5.2 Appending or Deleting a Column 
Assume that we have the QR factorization 
QR = A = [a,...,04] a; € R” (12.5.5) 


and partition the upper triangular matrix R € IR?*^ as follows: 


Hu v Rig k-1 
R= 0 Tkk wt 1 
n 0 0 R33 m-k 


k—1 1 n-k 
Now suppose that we want to compute the QR factorization of 


A= [@),..-,@e-1,Qk+1,--+5An | e IR?x(n79 . 


Note that A is just A with its kth column deleted and that 


Ru Rig 
QTA=| 0 wh | =H 


is upper Hessenberg, e.g., 


X X X X xX 
0 x x Xx x 
0 0 x x x 
H = 0 0 x x x m=7,n=6,k=3 
0 0 0 x x 
0 0 0 0 x 
0 0 0 0 0 
Clearly, the unwanted subdiagonal elements hí41 &,..., ha,s-, can be ze- 


roed by a sequence of Givens rotations: GI_, -- -GTH = R,. Here, G; is 
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a rotation in planes i and i+1 for i = k:n — 1. Thus, if Q1 = QG---Gy-1 
then A = Qı Rı is the QR factorization of A. 

The above update procedure can be executed in O(n?) flops and is 
very useful in certain least squares problems. For example, one may wish 
to examine the significance of the kth factor in the underlying model by 
deleting the kth column of the corresponding data matrix and solving the 
resulting LS problem. 

In a similar vein, it is useful to be able to compute efficiently the solution 
to the LS problem after a column has been appended to A. Suppose we have 
the QR factorization (12.5.5) and now wish to compute the QR factorization 
of 


~ 


A= [a1 - -3 ak, Z; Gk, -sên | 
where z € R” is given. Note that if w = QT z then 
QTA = [Q7a),...,Q7 ax, w, Q akt.. Qan] = A 


is upper triangular except for the presence of a “spike” in its k+1-st column, 
eg., 


X X X X X X 
0 x x X X x 
: 0 0 x x x x 
A= 0 0 0 x x X m=7,n=5,k=3 
0090 x O x 
000 x 0 0 
0 0 0 x 0 Q 
It is possible to determine Givens rotations J,,~1,...,J%41 so that 
Wy 
T T Wk+1 
Jkt ite Jm- = 0 
0 


with J7,,--- JL ,À = Ruppertriangular. We illustrate this by continuing 
with the above example: 


i 

eu 

mei 

Il 
Soe oo a 2 xX 
Sooo © @ XK X 
oo eo @O KX x x 
oo Ke x x x Xx 


OX X X X X Xx 
OOO X X X X 
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X X X X X X 
Q0 x x x x X 
0 0 x x x x 
H-JÍH-|0 00 x x x 
0 0 0 x QO x 
00 0 0 0 x 
0 0 0 0 0 Q 
X X X X X X 
0 x X X X X 
0 0 x x x x 
H-JIH-|000 x x x 
0 0 0 0 x x 
0 00 00 x 
0.0000 0 


This update requires O(mn) flops. 


12.5.3  Appending or Deleting a Row 


Suppose we have the QR factorization QR = A € IR"*^ and now wish to 
obtain the QR factorization of 


" wT 
a= [| 
where w € IR”. Note that 
. T 
diag(1,QT)À = i | =H 


is upper Hessenberg. Thus, Givens rotations J1,..., J, could be determined 
so JT ... JTH = R; is upper triangular. It follows that 


A=Q,R 


is the desired QR factorization, where Q, = diag(1,Q)J1--+ Jn. 

No essential complications result if the new row is added between rows 
k and k -- 1 of A. We merely apply the above with A replaced by PA and 
Q replaced by PQ where 


Upon completion diag(1, P7)Q, is the desired orthogonal factor. 
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Lastly, we consider how to update the QR factorization QR = A € IR"*" 
when the first row of A is deleted. In particular, we wish to compute the 
QR factorization of the submatrix A, in 


zT 1 
im EE m-i 


(The procedure is similar when an arbitrary row is deleted.) Let g^ be the 
first row of Q and compute Givens rotations G),...,Gm—, such that 


T T 
Gj--:G,4.1g = ae, 


where a = +1. Note that 


H = GG]. GL R 


ll 
r7 
Dh 
3 
| f 
n 


is upper Hessenberg and that 
a Q0 
"EI 


where Q, € IR(?-D*(-7 is orthogonal. Thus, 


zt 


T ee p | = (QGa-i-G1(GT--- G1, ,R) = E a | m 


trom which we conclude that A, = Q4 Rı is the desired QR factorization. 


12.5.4 Hyperbolic Transformation Methods 


Recall that the ^R" in A = QR is the transposed Cholesky factor in AT A = 
GGT. Thus, there is a close connection between the QR modifications just 
discussed and analogous modifications of the Cholesky factorization. We 
illustrate this with the Cholesky downdating problem which corresponds to 
the removal of an A-row in QR. In the Cholesky downdating problem we 
have the Cholesky factorization 


T]Tr[,r 
GGT = ATA = | Ai | | p | (12.5.6) 


where A € IR"*" with m > n and z € IR". Our task is to find a lower 
triangular G4 such that GGT = AT A,. There are several approaches to 
this interesting and important problem. Simply because it is an opportunity 
to introduce some new ideas, we present a downdating procedure that relies 
on hyperbolic transformations. 
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We start with a definition. H € R”*™ is pseudo-orthogonal with respect 
to the signature matriz S = diag(+1) c R”*™ if HTSH = S. Now from 
(12.5.6) we have AT A = AT A, + zz? = GGT and so 


z 


T 
ATA, = ATA -zz = GGT —-z;T = [Gz] | F E | n | 
Define the signature matrix 


S = & al (12.5.7) 


and suppose that we can find H € IR^* U*(** such that HTSH = S with 
the property that 


T T 
n| e | = E | (12.5.8) 
is upper triangular. It follows that 
T 
ATA, — (6 «sir | 7 | = e o)s| rd zT 


is the sought after Cholesky factorization. 
We now show how to construct the hyperbolic transformation H in 
(12.5.8) using hyperbolic rotations. A 2-by-2 hyperbolic rotation has the 


form 
s-[ m cpi] 


Note that if H € IR?*? is a hyperbolic rotation then HTSH = S where S 
— diag(-1,1). Paralleling our Givens rotations developments, let us see how 
hyperbolic rotations can be used for zeroing. From 


sg] B]. ees 


we obtain the equation ez9 = sz. Note that there is no solution to this 
equation if xj = z2 Æ 0, a clue that hyperbolic rotations are not as nu- 
merically solid as their Givens rotation counterparts. If r) Æ r4 then it is 
possible to compute the cosh-sinh pair: 
if r2 = 0 
s=0;c=1 
else (12.5.9) 
if |x| < |x ıl 
T = 22/21; c = 1/V1 — 7?; 8 = cT 
elseif |x| < |z2| 
T = 21/"9; s=1/V1—7*;3¢= sT 
end 
end 
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Observe that the norm of the hyperbolic rotation produced by this algo- 
rithm gets large as zı gets close to T32. 

Now any matrix H = H(p,n + 1,8) € RHD) that is the identity 
everywhere except hpp = Aniin4i = cosh(@) and Apnii = Anitp = 
— sinh(8) satisfies SH = S where S is prescribed in (12.5.7). Using 
(12.5.9), we attempt to generate hyperbolic rotations Hy = H(1, k, 6%) for 


k = 2:n + 1 so that 
GT GT 
«n[5]- 1] 


This turns out to be possible if A has full column rank. Hyperbolic rotation 
Hy zeros entry (k + 1, k). In other words, if A has full column rank, then 
it can be shown that each call to (12.5.9) results in a cosh-sinh pair. See 
Alexander, Pan, and Plemmons (1988). 


12.5.5 Updating the ULV Decomposition 


Suppose A € IR™*” is rank deficient and that we have a basis for its null 
space. If we add a row to A, 


then how easy is it to compute a null basis for A? When a sequence of 
such update problems are involved the issue is one of tracking the null 
space. Subspace tracking arises in a number of real-time signal processing 
applications. 

Working with the SVD is awkward in this context because O(n?) flops 
are required to recompute the SVD of a matrix that has undergone a unit 
rank perturbation. On the other hand, Stewart (1993) has shown that the 
null space updating problem becomes O(n?) per step if we properly couple 
the ideas of condition estimation of 83.5.4 and complete orthogonal decom- 
position. Recall from 85.4.2 that a complete orthogonal decomposition is 
two-sided and reveals the rank of the underlying matrix, 


UT AV = | tu 0 
0 


0 | i Tii € IR, T= rank( A). 


A pair of QR factorizations (one with column pivoting) can be used to 
compute this. In this case T1, = L is lower triangular in exact arithmetic. 
But with noise and roundoff we instead compute 


614 CHAPTER 12. SPECIAL TOPICS 


UT AV = 


o cmt 


0 
E (12.5.10) 
0 


where L € IR** and E € R(~7)*"-7) are lower triangular and H and E 
are “small” compared to omin(L). In this case we refer to (12.5.10) as a 
rank-revealing ULV decomposition.! Note that if 


V-2[VW W] UU. Uyi 
r n-—Tr T mi, —T 


then the columns of V2 define an approximate null space: 


| AV» |l am Il UE ll < | E lz- 


Our goal is to produce a rank-revealing ULV decomposition for the row- 
appended matrix A. To be more specific, our aim is to show how to produce 
updates of L, E, H, V, and (possibly) the rank in O(n?) flops. 

Note that 


L 0 

uo) [A], |H E 
loi i2 v-| t 4 
wl yT 


By permuting the bottom row up "underneath" H and E we see that the 
challenge is to compute a rank-revealing ULV decomposition of 


£ 0 0 0/0 0 0 
££ 0 0/0 00 
££ £ 0|0 0 0 
L 0 
££ € £000 
er) he RE he oO eee) 
y hhhhie e 0 
h h h Ale e e 
w w w w|y y y 


in O(n?) flops. Here and in the sequel, we set r = 4 and n = 7 to illustrate 
the main ideas. Bear in mind that the h and e entries are small and that 
‘Dual to this is the URV decomposition in which the rank-revealing form is upper 


triangular. There are updating situations that sometimes favor the manipulation of this 
form instead of ULV. 
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we have deduced that the numerical rank is four. In practice, this involves 
comparisons with a small tolerance as discussed in §5.5.7. 

Using zeroing techniques similar to those presented in §12.5.3, the bot- 
tom row can be zeroed with a sequence of row rotations giving 


oOx X XX OO ® 
oOx X XjO Oo & 
>x xXx olo oo oo o 
Ox o olo oo ooo 


OIX X XIX xoo 


©IX X KIX XXO 


© X X X|X XXX 


Because this zeroing process intermingles the (presumably large) entries of 
the bottom row with the entries from each of the other rows, the triangular 
form typically is not rank revealing. However, we can restore the rank- 
revealing structure with a combination of condition estimation and zero- 
chasing with rotations. Let us assume that with the added row, the new 
null space has dimension two. 

With a reliable condition estimator we produce a unit 2-norm vector p 
such that 


l 27 L |p = min (L). 
See §3.5.4. Rotations (U;;,1)9., can be found such that 
Ug; Uss Us Usa Usa UiaP = es = Ia(:,8). 


The matrix 
H -ULULULTULULULL 


is lower Hessenberg and can be restored to a lower triangular form L4 by 
a sequence of column rotations: 


L+ = HV\2V23 V34 Vas Vse V67. 
It follows that 
ed L} = (e$ H) Viz V23V34Vas Vs6Ve7 = (pL) Vi2V23Va4Vas Vse Ver 


has approximate norm o,,;,(Z). Thus, we obtain a lower triangular matrix 


of the form 
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with small h’s and e. We can repeat the condition estimation and zero 
chasing on the leading 6-by-6 portion thereby producing (perhaps) another 
row of small numbers: 


(If not, then the revealed rank is 6.) Continuing in this way, we can restore 
any lower triangular matrix to rank-revealing form. 

In the event that the y vector in (12.5.11) is small, we can reach rank- 
revealing form by a different, more efficient route. We start with a sequence 
of left and right Givens rotations to zero all but the first component of y: 


| 
N|z- em iIa 9 o9 S 
Hic s re S0 
Hic cm cj 00 
His ad SoooSo 
e/o & ALO GO So & 
eloa èa oloocceo 
Cla a COO COCO 

|F 
zE Ea aa 9 o8 
zE f- aa & © 
zE ee 9c 
zE E T. 
«€ ecccjcooc 
Sian a oO Oo Oo @ 
Ola Colo occe 


a a oloo o & 


Sir r aS OO SO 
arn a oro oe & 
olin o OS o oco o 


RTT Te SOO 


Slr ar ase we oS O 


je s TS os oA 


Qa a aolo oao oo 
Ono O OIO Go & 


RIST ITIR OO SO 


Birr aes & OO 


qr raps S&S & o 


Hr er TS & 99 X 
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Here, “U;;” means a rotation of rows i and j and "*Vi;" means a rotation of 
columns i and j. It is important to observe that there is no intermingling 
of small and large numbers during this process. The h’s and e's are still 
small. 


Following this, we produce a sequence of rotations that transform the 
matrix to 


(12.5.12) 


0 
0 
0 
0 
€ 
e 
e 
y 


apr rT aS ^9 79 OS 
List st Sse & "oO 
[zm 
Oja ao cocco 


where all the y’s are small: 


£0 0 0);0 0 0 £ 0 0 0/0 0 0 
£é £ 0 04,0 0 0 £ £ 0 0;0 0 0 
é £ £ 0[0 0 0 £ £ t£ Op 0 0 
Ug £e £ £ £ p 9 0 Uag Eod E E H 0 0 
— h h k h|e oO 0 — h k h hje O 0 
h h h hje ee h h h hje e O 
h h h hje e e h h h hje ee 
x x x Oly 0 0 x x 0 Oly 00 
é 0 0 0/0 0 £0 0 Oj] zp 00 
é £ 0 Oju 0 £é £0 0} p 00 
£ £ £ Oju 0 £ £ 2 Oju 00 
Ung E £ £ lp 0 Uig £ £ £ ljun 00 
— h h h hje e — h h h hie 00 
h h h hje e h h h hile e0 
h h h hie e h h h h| e e e 
z 0 0O Oly 0 0 0 O 0]y., 0 0 


Note that y.. is small because of 2-norm preservation. Column rotations 
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in planes (1,5), (2,5), (3,5), and (4,5) can remove the js: 


£ 0 £ 0 0/0 0 0 
£ 0 £ 0 0;0 0 0 
£ 0 £ £ 0|n O O 
Vis £ g Vos t ££ H 0 0 
—> h h — h h hile O 0 
h h h h hie e O0 
h h h h hje e e 
y 0 y 0 Ofjy 0 0 
£20 0 0 £ 0 010 0 0 
£ £ 00 £ 0 010 0 0 
£ £ £0 £ £ 010 0 0 
Vas £ £ £ £ Vas £ £ £210 0 0 
— h h h h — h h hje 0 0 
h h h h h h hile e 0 
h h h h h h hie e e 
yyy 0 y 0 


thus producing the structure displayed in (12.5.12). All the y’s are small 
and thus a sequence of row rotations Us7, U47,...,U17, can be constructed 
to clean out the bottom row giving the rank-revealed form 


Cio CO OO OO O&O 


Problems 


P12.5.1 Suppose we have the QR factorization for A c R™*” and now wish to mini- 
mize || (A + uv? )x — 6 || where u,b c R™ and v € R” are given. Give an algorithm for 
solving this problem that requires O(mn) flops. Assume that Q must be updated. 
P12.5.2 Suppose we have the QR factorization QR = A € R™*". Give an algorithm 
for computing the QR factorization of the matrix A obtained by deleting the kth row of 
A. Your algorithm should require O(mn) flops. 

P12.5.3 Suppose T € R?*" is tridiagonal and symmetric and that v c R”. Show how 
the Lanczos algorithm can be used (in principle) to compute an orthogonal Q € R?*" 
in O(n?) flops such that QT (T + vvT)Q = T is also tridiagonal. 

P12.5.4 Suppose 


A= | A | ce R?, Be Rim-) xn 
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has full column rank and m > n. Using the Sherman-Morrison- Woodbury formula show 
that 


Be end l(ATA)" ela — 
amin B) ~ minl A) 1 — cT (AT A)-1lc 


P12.5.5 As a function of xı and x2, what is the 2-norm of the hyperbolic rotation 
produced by (12.5.9)? 


P12.5.6 Show that the hyperbolic reduction in 812.5.4 does not breakdown if A has 


full column rank. 
P12.5.7 Assume 
d R H 
uH 0 E 
ll E lle 


~ Gmin(R) 
eile, | 


ERU EA 


then || Hs lla < ell H Il. 


where R and E are square with 
< 1l. 


Show that if 


is orthogonal and 
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12.0  Modified/Structured Eigenproblems 


In this section we treat an array of constrained, inverse, and structured 
eigenvalue problems. Although the examples are not related, collectively 
they show how certain special eigenproblems can be solved using the basic 
factorization ideas presented in earlier chapters. 

The dependence of this section upon earlier portions of the book is as 
follows: 


885.1, 5.2, 8.1, 8.3 — §12.6.1 
688.1, 8.3, 9.1 — $812.62 
864.7, 8.1 + §12.6.3 
685.1, 5.2, 5.4, 7.4, 8.1, 8.2,83,86 — §1264 


12.6.1 A Constrained Eigenvalue Problem 


Let A € IR?** be symmetric. The gradient of r(x) = zT Az/zT x is zero if 
and only if z is an eigenvector of A. Thus the stationary values of r(z) are 
therefore the eigenvalues of A. 

In certain applications it is necessary to find the stationary values of r(x) 
subject to the constraint CT z = 0 where C € IR"*? with n > p. Suppose 


QTCZ = H o | nan r — rank(C) 
rp-r 


is a complete orthogonal decomposition of C. Define B € IR"*” by 


By, B T 
T = — 11 12 
Q AQ xd | Bz Bog | n-r 


T n-r 
and set 


= T M u T 
ye | v | n—r 

Since CT x = 0 transforms to STu = 0, the original problem becomes one of 
finding the stationary values of r(y) = y? By/yT y subject to the constraint 
that u — 0. But this amounts merely to finding the stationary values 
(eigenvalues) of the (n — r)-by-(n — r) symmetric matrix B5. 
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12.6.2 Two Inverse Eigenvalue Problems 


Consider the r = 1 case in the previous subsection. Let Nie Sock Auc | be 
the stationary values of zT Az/z7 x subject to the constraint c^ z = 0. From 
Theorem 8.1.7, it is easy to show that these stationary values interlace the 
eigenvalues A; of A: 


An S Aa-1SAn-1€ Ag Ar € X. 


Now suppose that A has distinct eigenvalues and that we are given the 
values \1,..-,An—1 that satisfy 


As d orat Xa qb epe Ap E ATE A. 


We seek to determine a unit vector c € R” such that the À; are the station- 
ary values of zT Ax subject to z^z = 1 and cT x = 0. 

In order to determine the properties that c must have, we use the method 
of Lagrange multipliers. Equating the gradient of 


plz, à, u) = zT Ax — Mala — 1) + 2uz?c 


to zero we obtain the important equation (A — A7)z = — ue. Thus, A— AI is 
nonsingular and so x = —u(A — AI)*!c. Applying c? to both sides of this 
equation and substituting the eigenvalue decomposition QT AQ = diag(A;) 
we obtain 


where d = QTc, i.e., 


p(A) = = yd I[95-» 


i-1 j=l 


jfi 
Notice that 1 = || c ||} = || d || = d?+---+4? is the coefficient of (—A)"~?. 
Since p(A) is a polynomial having zeroes 41,..., An—1 we must have 
n-1 . 
pA) = [[G;-». 
j=l 


n-1 
[] 0; - Ax) 

dj = $—————— — k= 1m. (12.6.1) 
DEO- Ax) 
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This determines each d, up to its sign. Thus there are 2” different solutions 
c = Qd to the original problem. 
À related inverse eigenvalue problem involves finding a tridiagonal ma- 


trix 

a, py nes 0 

B œ : 

T= 

uo fla 

0 awe Doi Ay 
such that T has prescribed eigenvalues (A,,...,A4) and T(2:n, 2:n) has 
prescribed eigenvalues (3;,...,À4 1) with 


Me SAL DS Ag De SB acy A DA 


We show how to compute the tridiagonal T via the Lanczos process. Note 
that the A; are the stationary values of 


y. Ay 
yTy 
subject to dy = 0 where A = diag(\;,..., Àn) and d is specified by (12.6.1). 
If we apply the Lanczos iteration (9.1.3) with A = A and q; = d, then it 
produces an orthogonal matrix Q and a tridiagonal matrix T' such that 


QTAQ = T. With the definition z = QT y, it follows that the A; are the 
stationary values of 


é(y) = 


zrtTz 
rir 


Yz) = 


subject to eT z = 0. But these are precisely the eigenvalues of T(2:n, 2:n)! 
12.6.3 A Toeplitz Eigenproblem 


Assume that 5 
1 r 
lie] 


is symmetric, positive definite, and Toeplitz with r € IR*^!. Our goal is to 
compute the smallest eigenvalue Amin(T) of T given that 


Amin (T) < Amin (G) . 


This problem is considered in Cybenko and Van Loan (1986) and has ap- 
plications in signal processing. 


"EHe 


624 CHAPTER 12. SPECIAL TOPICS 


atriy = Aa 
ar+Gy = Ay. 


If A £ A(G), then y = —a(G — AI)*!r, a £0, and 
a+r” [-a(G - AI)! r] = Aa. 

Thus, A is a zero of the rational function 
f(A)21-A-r'(G — AI)7r. 


We have dealt with similar functions in $8.5 and 812.1. In this case, f 
always has a negative slope 


F0) = -1-|[(G- AD7?r|$ < -1. 
If À < Amin(G), then it also has a negative second derivative: 
f") = -21T(G - AI) ?r « 0. 
Using these facts it can be shown that if 
AmiLr) € AU = XS), (12.6.2) 


then the Newton iteration 


k o FAW) 
ACD = te) FX (12.6.3) 


converges to A,,;4 (T) monotonically from the right. Note that 


T k 
AD — AQ 4 L+rtw— A9 
l+wtw 


where w solves the “shifted” Yule-Walker system 
(G — AX Tw = ~r. 


Since, A? < Amin (G), this system is positive definite and Algorithm 4.7.1 
is applicable if we simply apply it to the normalized Toeplitz matrix (G — 
ADT — AG»), 

A starting value that satisfies (12.6.2) can be obtained by examining 
the Durbin algorithm when it is applied to Ty, = (T - AD)/(1— A). For 
this matrix the “r” vector is r/(1 — à) and so the Durbin algorithm (4.7.1) 
transforms to 
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r —r/(1-— 2) 
y 
for k = 1:n—1 
By — 14 [r9]T, 09 (12.6.4) 


ay = —(Tk+1 + r^ E,y00)/8, 
yt — | oe | 
Qk 

end 


From the discussion in §4.7.2 we know that (),...,8, > 0 implies that 
Ty(1:k +1, L:k + 1) is positive definite. Hence, a suitably modified (12.6.4) 
can be used to compute m(A), the largest index m such that £1,- .. , m are 
all positive but that 8,4, < 0. Note that if m(A) = n — 2, then (12.6.2) 
holds. This suggests the following bisection scheme: 


Choose L and R so L < Aguas (1) < Amin(G) < R. 
Until m=n-—2 
à = (L + R)/2 
m = m(A) 
ifmcn-2 (12.6.5) 
RA 
end 
ifm=n-1 
L= > 
end 
end 


The bracketing interval [L, R] always contains a À such that m(A) 2 n — 2 
and so the current A has this property upon termination. 

There are several possible choices for a starting interval. One idea is to 
set L = 0 and R = 1 — |ri| since 


0 < Amin (1 ) < Amin(G) < Amin (| z f |) —1- Inil 


where the upper bound follows from Theorem 8.1.7. 

Note that the iterations in (12.6.4) and (12.6.5) involve at most O(n?) 
flops. A heuristic argument that O(log) iterations are required is given 
in Cybenko and Van Loan (1986). 


12.6.4 An Orthogonal Matrix Eigenproblem 


Computing the eigenvalues and eigenvectors of a real orthogonal matrix 
A € IR"*" is a problem that arises in signal processing, see Cybenko (1985). 
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The eigenvalues of A are on the unit circle and moreover, 


cos(@) +i sin(@) € A(A) «€ cos(@) € A (A) = À (4 Y : 


This suggests computing Re(A(.A)) via the Schur decomposition 
A T 
a (5) Q = diag(cos(81),...,cos(ĝn)) 


and then computing Im(A(.A)) with the formula 3 = y1 -— c^. Unfortu- 
nately, if |c| + 1, then this formula does not produce an accurate sine 
because of floating point cancellation. We could work with the skew- 
symmetric matrix (A — AT) /2 to get the “small sine” eigenvalues, but then 
we are talking about a method that requires a pair of full Schur decompo- 
sition problems and the approach begins to lose its appeal. 

A way around these difficulties that involves an interesting SVD ap- 
plication is proposed by Ammar, Gragg, and Reichel (1986). We present 
just the eigenvalue portion of their algorithm. The derivation is instructive 
because it involves practically every decomposition that we have studied. 

The first step is to orthogonally reduce A to upper Hessenberg form, 
QT AQ = H. (Frequently, A is already in Hessenberg form.) Without loss 
of generality, we may assume that H is unreduced with positive subdiagonal 
elements. 

If n is odd, then it must have a real eigenvalue because the eigenvalues 
of a real matrix come in complex conjugate pairs. In this case it is possible 
to deflate the problem with O(n) work to size n — 1 by carefully working 
with the eigenvector equation Hx = x (or Hz = —x). See Gragg (1986). 
Thus, we may assume that n is even. 

For 1 < k < n — 1, define the reflection G, € R”*” by 


fea d 3 0 

0 —ck 8 0 

Gk = Gx (dx) = 0 a i 0 
0 0 0 dob 


where ck = cos(ġk), sk = sin(d,), and 0 < k < m. It is possible to 
determine G,,...,G4.., such that 


H = (Ci ie Gn-1) diag(1, e.l, —Cn) 


where c, = +1. This is just the QR decomposition of H. The sines 
$1,..., $5; are the subdiagonal entries of H. The “R” matrix is diagonal 
because it is orthogonal and triangular. Since the determinant of a reflection 
is -1, det(H) = cn. This quantity is the product of H’s eigenvalues and so 
if c&n = —1, then {—1,1} C A(H). In this situation it is also possible to 
deflate. 
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So altogether we may assume that n is even and 


H= Gi (1) Test Gy 1(ón-1)Gn(ón) 


where G4, = G4(ó4) = diag(l,...,1, —c4,) and c, = 1. Designate the 
sought after eigenvalues by 


MH) = {cos(6,) + i- sin(8) YE (12.6.4) 


where m = 7/2. 

The cosines ¢),...,€n are called the Schur parameters and as we men- 
tioned, the corresponding sines are the subdiagonal entries of H. Using 
these numbers it is possible to construct erplicitly a pair of bidiagonal ma- 
trices Bo, Bg € IR^ with the property that 


c(Bc(l:m,l:m)) = {cos(@,/2),...,cos(@m/2)} (12.6.5) 
c(Bs(1:m,l:m)) = {sin(@,/2),...,sin(@,/2)} (12.6.6) 
The singular values of Bo(1:m,1:m) and Bg(i:m, 1:m) can be computed 
using the bidiagonal SVD algorithm. The angle 6, can be accurately com- 
puted from sin(0,/2) if 0 < 0, < -/2 and accurately computed from 


cos(0,/2) if 7/2 < 0, < a. The construction of Bc and Bg is based 
on three facts: 


1. H is similar to . 
H =H. 


where H, and H, are the odd and even reflection products 


Ho = GıG3'- Gn- 
He GG4 Gn. 


These matrices are block diagonal with 2-by-2 and 1-by-1 blocks, i.e., 


H, = diag(R(ói), R(d3),-.-,R(dn-1)) (12.6.7) 
He = diag(1, R(d2), R(d4),---,R(bn-2),—1) (12.6.8) 
mes (6) sin(ó) 
— cos sin 
R() = | sina) cond) | (12.6.9) 
2. The eigenvalues of the symmetric tridiagonal matrices 
Qua e uud doces (12.6.10) 
2 2 
are given by 


A(C) = { @cos(@)/2),...,+cos(@m/2)} (12.6.11) 
A(S) { +sin(@,/2),...,+sin(6,/2)}. — (12.6.12) 


I 
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3. It is possible to construct bidiagonalizations 
ULCVo — Bo and | UÍSVsg- Bg 


that satisfy (12.6.5) and (12.6.6). The transformations Uc, Vo, Ug, 
and Vs are products of known reflections G,; and simple permutations, 


We begin the verification of these three facts by showing that H is 
similar to H4, H,. The n = 8 case is sufficient for this purpose. Define the 
orthogonal matrix P by 


F3 = G3G4G5GgG31Gg 
r= FF; F3 where F; = Gs G6G7Gg 
Fr = G7Ga. 


Since reflections are symmetric and G;G; = G;Gi if |i — j| > 2, we see that 


F,HFT 


i 


(G3G4Gs5GaG2Ga)(G1G2G3G4G5GaG7Gg)(G3G4G5GasG7Gg)T 
(G3G4GsG6G7G8)G1G2 
G1G3G2G4G5G6G7Ga, 


tll 


Fs (FSHFI)FT (G5GsG2Ga)(G1G3G3G4G5GgGzGag)(GsGgGzGg)!, 
(G5GgG2Gg)G1G3G2G4 


G1G3G5G2G4GgG7Gg 


lu dH 


PHP? = FEW(F&FHFIFJ)FT 
(G7Gg)(G1G3G5G2Gs4GgG7Gg)(G7Gg)* 
= (G1G3G5G7)(G2G4GgGs) = H,H,. 


| 


The second of the three facts that we need to establish relates the eigen- 
values of H = HoH, to the eigenvalues of the C and S matrices defined 
in (12.6.10). It follows from (12.6.7) and (12.6.8) that these matrices are 
symmetric, tridiagonal, and unreduced, e.g., 


—C] $1 0 0 
O a 1 $1 C€1— C2 $2 0 
2 0 $2 C2 — C3 83 
0 0 83 C3 — C4 
—C| $1 0 0 
$ = 1 S C1 +e -— 89 0 
2 0 — S2 —029 — C3 $3 
0 0 $3 C3 t C4 


By working with the definitions it is easy to verify that 


H+HAT  H,H.-(H,H. HoH. + H.H, 2 
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and 


H+H? — H,H, — (HoH) _ H,H, —  H.H, 


This shows that Re(A(H)) = A(2C? — I) and Im(A(H)) = A(-2iCS) 
thereby establishing (12.6.11) and (12.6.12). 

Instead of thinking of these half-angle cosines and sines as eigenvalues 
of n-by-n matrices, it is more efficient to think of them as singular values 
of m-by-m matrices. This brings us to the bidiagonalization of C and S. 
The orthogonal equivalence transformations that carry out this task are 
based upon the Schur decompositions of H, and H,. A 2-by-2 reflection 
R($) defined by (12.6.9) has eigenvalues 1 and —1 and the following Schur 
decomposition: 


R(6/2)R(9) (4/2) = | A | | 


Thus, if 


Qo diag(R(¢1 /2), R(¢3/2), Seg R(¢n-1/2)) 
Qe diag(1, R(¢2/2), R(4/2),..., R(dn—2/2),-1) 


then from (12.6.7) and (12.6.8) H, and H, have the following Schur decom- 
positions: 


QoQ = D, 
QeHeQe De 


The matrices 


diag(1, hisle i=l) 
diag(1, 1, =i; lsk, are wel. —1). 


! 


cU = Q,CQ.= 5. (Ho + H.)Qe = 3 (QoQe) + (QoQe)De) 
s) QuSQ,. = Qe (Ho - H.) Qe = z > (Dol (QoQe) — (QoQe)De) 


have the same singular values as C and S respectively. To analyze their 
structure we first note that QoQ, is banded: 


QoQe = 


ooo o Oo 8 X xX 
oooo x X X X 
ooo ao x X X X 
oo x X X xX Ooo 
oo xXx X XX OS 
x X X xXx OOo oo 
x X XxX OOO ®& 
xx oono oo O 
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(The main ideas from this point on are amply communicated with n = 8 
examples.) If Do(i,i) and D,(j, j) have the opposite sign, then p E) 
from which we conclude that C? has the form 


ao by 0 0 0 0 0 0 

0 0b 00 00 0 

0 a 0 hà 0 0 0 0 

0 0 a 0 d. 0 0 0 
CM =Q0Q =| 9 0 a 0 b 0 0 
0: 70 9/0 cas. 0 beO 

0 0 0 0 0 a 0 0 

0 0 0 0 0 O0 ay ba 


Analogously, if D;(1, i) and D,(j, 7j) have the same sign, then sc ) = 0 from 
which we conclude that S(! has the form 


0.0 f 0 0 0 0 0 

£2 d» 0 0 0 0 0 0 

0 0 d 0 fg 0 0 0 

iy o | 0 e 040000 
8 = QS. = 0 0 0 0 d O fs, 0 
0 0 0 e 0 dg 0 0 

0 0 0 0 0 0 d; fr 

0 0 0 0 0 0 eg O 


Row/column permutations of these matrices result in bidiagonal forms: 


Bo = C([13572468],[12463578]) 


e 
e 


oo ajo Oo & 


0 
b3 
Q4 

0 

0 

0 

0 

0 


& p Goo 


ooo ojo oo 


om 
oO 
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Bs 


§((24681357],[12463578]) 


of 5^oloooo 


0 
0 
0 
0 
h 
d3 
0 
0 


It is not hard to verify that a's, bs, d's, e’s, and f’s are all nonzero and 
this implies that the singular values of Bo(1:m, 1:m) and Bs(1:m, 1:m) are 
distinct. Since 


a(C) = e(Bc) { cos(@1/2), cos(@1/2),...,cos(A@m/2), cos(@m/2) } 
a(S) =o(Bs) = {sin(@,/2), sin(0:/2), .. . , sin(65,/2), sin(65,/2) } 


we have verified (12.6.5) and (12.6.6). 


Problems 


P12.6.1 Let A c R'??*? and consider the problem of finding the stationary values of 
y! Az 


Raw) = iv lile fia 


y c R^,2c R” 


subject to the constraints 

CTr=0 CcR'"? n>p 

DTy-0 DER” m>q 
Show how to solve this problem by first computing complete orthogonal decompositions 
of C and D and then computing the SVD of a certain submatrix of a transformed A. 


P12.6.2 Suppose A € R™*" and B c RP*". Assume that rank(A) = n and rank(B) = 
p. Using the methods of this section, show how to solve 


A, b 2d 
lè-Asl iu [n A 


n = 
Bz=0 | T i2 +1 Bz=0 | | z | 
-1 


2 


2 
2 
Show that this is & constrained TLS problem. Is there always a solution? 

P12.0.3 Suppose A € R**" is symmetric and that B € RP?*" has rank p. Let d € FP. 
Show how to solve the problem of minimizing z7 Az subject to the constraints || z || = 
l and Bz = d. Indicate when a solution fails to exist. 

P12.8.4 Assume that A € R"™" is symmetric, large, and sparse and that C € R"™? is 
also large and sparse. How can the Lanczos process be used to find the stationary values 
of 


xT Ax 
alg 


r(x) = 
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subject to the constraint CT z = 0? Assume that a sparse QR factorization C = QR is 
available. 


P12.6.5 Relate the eigenvalues and eigenvectors of 


0 A, 0 0 
0 0 A OU 
0 0 0 Ag 
Aq 0 0 0 


to the eigenvalues and eigenvectors of A = A1A2A3A4. Assume that the diagonal blocks 
in A are square. 

P12.6.6 Prove that if (12.6.2) holds, then (12.6.3) converges to Amin(T) montonically 
from the right. 


P12.6.7 Recall from $4.7 that it is possible to compute the inverse of a symmetric pos- 
itive definite Toeplitz matrix in O(n?) flops. Use this fact to obtain an initial bracketing 
interval for (12.6.5) that is based on || 77! || ; and || G^! ||,,. 

P12.6.8 A matrix A € R**" is centrosymmetric if it is symmetric and persymmet- 
ric, ie, A = En AE, where En = In(:,n: — 1:1). Show that if n = 2m and Q is the 
orthogonal matrix 


then P P à 
T = 11 + Alziim 
qa | 0 An — AvzEm | 


where A11 = A(1:m, 1:n) and A13 = A(1:m, m + 1:n). Show that if n = 2m, then the 
Schur decamposition of & centrosymmetric matrix can be computed with one-fourth the 
flops that it takes to compute the Schur decomposition of a symmetric matrix, assuming 
that the QR. algorithm is used in both cases. Repeat the problem if n = 2m + 1. 


P12.6.9 Suppose F,G € R"*" are symmetric and that 
Q=[Q Q2] 
p n-p 
is an n-by-n orthogonal matrix. Show how to compute Q and p so that 
F(Q, p) = tr(QT FQ1) + (Q7 GQ2) 


is maximized. Hint: u(QT FQ1) + tr(QT GQ2) = tr(QT (F — G)Q1) + tr(G). 
P12.6.10 Suppose A € R?** is given and consider the problem of minimizing || A — S || p 
over all symmetric positive semidefinite matrices S that have rank r or less. Show that 


min(k,r) 
S= > MGI 
i=l 
solves this problem where 
A+ AT 
> = Qdiag(A1,...,An)QT 


is the Schur decomposition of A's symmetric part, Q = [qg1,...,94 ], and 
Ay 2o 2 AR > OD Any 2+: 2 An. 


P12.6.11 Verify for general n (even) that H is similar to HoH. where these matrices 
are defined in §12.6.4. 


P12.6.12 Verify that the bidiagonal matrices Bc(1:m, 1:m) and Bs(1:m, 1:m) in $12.6.4 
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have nonzero entries on their diagonal and superdiagonal and specify their value. 
P12.6.13 A real 2n-by-2n matrix of the form 


TA G 
M=| p ar] 
is Hamiltonian if A € R°*" and F,G € R*X* are symmetric. Equivalently, if the or- 
thogonal matrix J is defined by 
[0 h 
at | 45) 


then M c FE2^*X?^ is Hamiltonian if and only if J7MJ = —MT. (a) Show that the 
eigenvalues of a Hamiltonian matrix come in plus-minus pairs. (b) A matrix $ € R2^*2^ 
is symplectic if JT SJ = —S-~T. Show that if S is symplectic and M is Hamiltonian, then 
S-!MS is also Hamiltonian. (c) Show that if Q c R?"*?" is orthogonal and symplectic, 


then Q 
q- | ho | 


where QTQ, + QTQ» = In and QTQ is symmetric. Thus, a Givens rotation of the 
form G(i,i 4- n, 0) is orthogonal symplectic as is the direct sum of n-by-n Householders. 
(d) Show how to compute a symplectic orthogonal U such that 

H R | 


T = 
U"MU - | 5 _yt 


where H is upper Hessenberg and D is diagonal. 


Notes and References for Sec. 12.6 


The inverse eigenvalue problems discussed in this §12.6.1 and §12.6.2 appear in the fol- 
lowing survey articles: 


G.H. Golub (1973). “Some Modified Matrix Eigenvalue Problems,” SIAM Review 15, 
318-44. 

D. Boley and G.H. Golub (1987). “A Survey of Matrix Inverse Eigenvalue Problems,” 
Inverse Problems 3, 595—622. 


References for the stationary value problem include 


G.E. Forsythe and G.H. Golub (1965). “On the Stationary Values of a Second-Degree 
Polynomial on the Unit Sphere,” SIAM J. App. Math. 18, 1050-68. 

G.H. Golub and R. Underwood (1970). “Stationary Values of the Ratio of Quadratic 
Forms Subject to Linear Constraints," Z. Angew. Math. Phys. 21, 318-26. 

S. Leon (1994). “Maximizing Bilinear Forms Subject to Linear Constraints," Lin. Alg. 
and Its Applic. 210, 49-58. 


An algorithm for minimizing zT Az where z satisfies Bx = d and || x ||2=1 is presented in 


W. Gander, G.H. Golub, and U. von Matt (1991). *A Constrained Eigenvalue Problem," 
in Numerical Linear Algebra, Digital Signal Processing, and Parallel Algorithms, 
G.H. Golub and P. Van Dooren (eds), Springer-Verlag, Berlin. 


Selected papers that discuss a range of inverse eigenvalue problems include 


G.H. Golub and J.H. Welsch (1969). “Calculation of Gauss Quadrature Rules,” Math. 
Comp. 23, 221-30. 

S. Friedland (1975). “On Inverse Multiplicative Eigenvalue Problems for Matrices,” Lin. 
Alg. and Its Applic. 12, 127-38. 
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D.L. Boley and G.H. Golub (1978). “The Matrix Inverse Eigenvalue Problem for Peri- 
odic Jacobi Matrices,” in Proc. Fourth Symposium on Basic Problems of Numerical 
Mathematics, Prague, pp. 63-76. 

W.E. Ferguson (1980). “The Construction of Jacobi and Periodic Jacobi Matrices with 
Prescribed Spectra," Math. Comp. 35, 1203-1220. 

J. Kautsky and G.H. Golub (1983). “On the Calculation of Jacobi Matrices," Lin. Alg. 
and Its Applic. 52/53, 439-456. 

D. Boley and G.H. Golub (1984). “A Modified Method for Restructuring Periodic Jacobi 
Matrices,” Math. Comp. 42, 143-150. 

W.B. Gragg and W.J. Harrod (1984). “The Numerically Stable Reconstruction of Jacobi 

‘Matrices from Spectral Data,” Numer. Math. 44, 317-336. 

S. Friedland, J. Nocedal, and M.L. Overton (1987). “The Formulation and Analysis of 
Numerical Methods for Inverse Eigenvalue Problems," SIAM J. Numer. Anal. 24, 
634-667. 

M.T. Chu (1992). "Numerical Methods for Inverse Singular Value Problems," SIAM J. 
Num. Anal. 29, 885-903. 

G. Ammar and G. He (1995). “On an Inverse Eigenvalue Problem for Unitary Matrices,” 
Lin. Alg. and Its Applic. 218, 263-271. 

H. Zha and Z. Zhang (1995). “A Note on Constructing a Symmetric Matrix with Spec- 
ified Diagonal Entries and Eigenvalues,” BIT 35, 448-451. 


Various Toeplitz eigenvalue computations are presented in 


G. Cybenko and C. Van Loan (1986). “Computing the Minimum Eigenvalue of a Sym- 
metric Positive Definite Toeplitz Matrix,” SIAM J. Sci. and Stat. Comp. 7, 123-131. 

W.F. Trench (1989). “Numerical Solution of the Eigenvalue Problem for Hermitian 
Toeplitz Matrices,” SIAM J. Matriz Anal. Appl. 10, 135-146. 

L. Reichel and L.N. Trefethen (1992). “Eigenvalues and Pseudo-eigenvalues of Toeplitz 
Matrices," Lin. Alg. and Its Applic. 162/163/164, 153-186. 

S.L. Handy and J.L. Barlow (1994). "Numerical Solution of the Eigenproblem for 
Banded, Symmetric Toeplitz Matrices," SIAM J. Matriz Anal. Appl. 15, 205—214. 


Unitary /orthogonal eigenvalue problems are treated in 


H. Rutishauser (1966), “Bestimmung der Eigenwerte Orthogonaler Matrizen,” Numer. 
Math. 9, 104-108. 

P.J. Eberlein and C.P. Huang (1975). “Global Convergence of the QR Algorithm for 
Unitary Matrices with Some Results for Normal Matrices,” SIAM J. Numer. Anal. 
12, 421-453. 

G. Cybenko (1985). "Computing Pisarenko Frequency Estimates," in Proceedings of 
the Princeton Conference on Information Science and Systems, Dept. of Electrical 
Engineering, Princeton University. 

W. B. Gragg (1986). “The QR Algorithm for Unitary Hessenberg Matrices," J. Comp. 
Appl. Math. 16, 1-8. 

G.S. Ammar, W.B. Gragg, and L. Reichel (1985). “On the Eigenproblem for Orthogonal 
Matrices," Proc. IEEE Conference on Decision and Control, 1963-1966. 

W.B. Gragg and L. Reichel (1990). “A Divide and Conquer Method for Unitary and 
Orthogonal Eigenproblems," Numer. Math. 57, 695-718. 


Hamiltonian eigenproblems (see P 12.6.13) occur throughout optimal control theory and 
are very important. 


C.C. Paige and C. Van Loan (1981). “A Schur Decomposition for Hamiltonian Matrices,” 
Lin, Alg. and Its Applic. 41, 11-32. 

C. Van Loan (1984). "A Symplectic Method for Approximating All the Eigenvalues of 
a Hamiltonian Matrix," Lin. Alg. and Its Applic. 61, 233—252. 
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R. Byers (1986) “A Hamiltonian QR Algorithm," SIAM J. Sci. and Stat. Comp. 7, 
212-229. 

V. Mehrmann (1988). “A Symplectic Orthogonal Method for Single Input or Single 
Output Discrete Time Optimal Quadratic Control Problems," STAM J. Matriz Anal. 
Appi. 9, 221-247. 

G. Ammar and V. Mehrmann (1991). “On Hamiltonian and Symplectic Hessenberg 
Forms," Lin.Alg. and Its Application 149, 55-72. 

A. Bunse-Gerstner, R. Byers, and V. Mehrmann (1992). *A Chart of Numerical Methods 
for Structured Eigenvalue Problems,” SIAM J. Matriz Anal. Appl. 13, 419-453. 


Other papers on modified/structured eigenvalue problems include 


A. Bunse-Gerstner and W.B. Gragg (1988). “Singular Value Decompositions of Complex 
Symmetric Matrices,” J. Comp. Applic. Math. £1, 41—54. 

R. Byers (1988). “A Bisection Method for Measuring the Distance of a Stable Matrix to 
the Unstable Matrices,” SIAM J. Sci. Stat. Comp. 9, 875—881. 

J.W. Demmel and W. Gragg (1993). “On Computing Accurate Singular Values and 
Eigenvalues of Matrices with Acyclic Graphs,” Lin. Alg. and Its Applic. 185, 203- 
217. 

A. Bunse-Gerstner, R. Byers, and V. Mehrmann (1993). “Numerical Methods for Simul- 
taneous Diagonalization,” SIAM J. Matriz Anal. Appl. 14, 927-949. 
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MINRES, 494 
Mixed precision, 127 
Modified eigenproblems, 621-3 
Modified Gram-Schmidt, 231-2, 241 
Modified LR algorithm, 361 
Moore-Penrose conditions, 257-8 
Multiple eigenvalues, 

and Lanczos tridiagonalization, 485 

and matrix functions, 560-1 
Multiple right hand sides, 91, 121 
Multiplicity of eigenvalues, 316 
Multipliers, 96 


Neighbor, 276 
Netlib, xiv 
Network topology, 276 
Node program, 285 
Nonderogatory matrices, 349 
Nonsíngular, 50 
Normal equations, 237—9, 545-7 
Normal matrix, 313-4 
Normality &nd eigenvalue condition, 323 
Norms 

matrix, 54fT 

vector, 52(T 
Notation 

block matrices, 24-5 

colon, 7, 19, 27 

matrix, 3 

submatrix, 27 
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vector, 4 

x-o, 16 
Null, 49 
Null space, 49 

intersection of, 602-3 
Numerical rank and SVD 260-2 


Off, 426 
Operation count. See Work or 
particular algorithm, 
Orthogonal 
basis, 69 
complement, 69 
matrix, 208 
Procrustes problem, 601 
projection, 75 
Orthogonal iteration 
Ritz acceleration and, 422 
symmetric, 410-1 
unsymmetric, 332-4 
Orthogonal matrix representations 
W'Y block form, 213-5 
factored form, 212-3 
Givens rotations, 217-8 
Orthonormal basis computation, 229-32 
Orthonormality, 69 
Outer product, 8 
Overdetermined system, 236 
Overfiow, 61 
Overwriting, 23 


Pade approximation, 572-4 
Parallel computation 
gaxpy 
message passing ring, 279 
shared memory (dynamic), 289—90 
shared memory (static), 287 
Cholesky 
message passing ring, 300 
divide and conquer, 445-6 
Jacobi, 431—4 
matrix multiplication 
shared memory, 292-3 
torus, 293-9 
Parlett-Reid method, 162~3 
Partitioned matrix, 6 
Pencils, 375 
diagonalization of, 461-2 
equivalence of, 376 
symmetric-definite, 461 
Permutation matrices, 109—10 
Persymmetric matrix, 193 
Perturbation theory for 
eigenvalues, 320—4 
eigenvalues (symmetric case), 395-7 
eigenvectors, 326-7 
eigenvectors (symmetric case), 399—400 
generalized eigenvalue, 377-8 
invariant subspaces 
symmetric case, 397-99 
unsymmetric case, 324—5 
least squares problem, 242-4 
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linear equation problem, 80ff 

pseudo-inverse, 258 

singular subspace pair, 450-1 

singular values, 449—50 

underdetermined systems, 272-3 
Pipelining, 35-6 
Pivoting, 109 

Aasen, 166 

column, 248-50 

complete, 117 

partial, 110 

symmetric matrices and, 148 
Pivots, 97 

condition and, 107 

zero, 103 
Plane rotations. See Givens rotations, 
p-norms, 52 

minimization in, 236 
Polar decomposition, 149 
Polynomia! preconditioner, 539-40 
Positive definite systems, 140-1 

Gauss-Seidel and, 512 


LDL? and, 142 
properties of, 141 
unsymmetric, 142 
Power method, 330-2 
symmetric case 405—6 
Pawer series of matrix, 565 
Powers of a matrix, 569 
Preconditioned conjugate 
gradient method, 532ff 
Pre-conditioners 
incomplete block,536—-7 
incomplete Cholesky, 535 
polynomial, 539-40 
unsymmetric case, 550 
Principal angles and vectors, 603-4 
Processor id, 276 
Procrustes problem, 601 
Projections, 75 
Pseudo-eigenvalues, 576—7 
Pseudo-inverse, 257 


QMR, 551 

QR algorithm for eigenvalues 
symmetric version, 414ff 
unsymmetric version, 352ff 

QR factorization, 223ff 
Block Householder 

computation, 225-6 

Classical gram-Schmidt and, 230-1 
column pivoting and, 248-50, 591 
Fast Givens computation of, 228-9 
Givens computation of, 226-7 
Hessenberg matrices and, 227-8 
Householder computation of, 224-5 
least square problem and, 239-42 
Modified Gram-Schmidt and, 231-2 
properties of, 229—30 
rank of matrix and, 248 
square systems and, 270-1 
tridiagonal matrix and, 417 
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underdetermined systems and, 271-2 
updating, 607-13 

Quadratic form, 394 

QZ algorithm, 384i 


Range, 49 
Rank of matrix, 49 
determination of, 259 
QR factorization and, 248 
subset selection and, 591-4 
SVD and, 72-3 
Rank deficient LS problem, 256i 
Rank-one modification 
of diagonal matrix, 442—4 
eigenvalues and, 397 
QR factorization and, 607-13 
Rayleigh quotient iteration, 408-9 
QR algorithm and, 422 
symmetric-definite pencils and, 465 
R-bidiagonalization, 552~3 
Re, 14 
Rea] Schur decomposition, 341 
generalized, 377 
recy, 277 
Rectangular LU, 102 
Relaxation parameter, 514 
Residuals vs. accuracy, 124 
Restarting 
Arnoldi method and, 501-3 
GMRES and, 549 
Lanczos and, 584 
Ridge regression, 583-5 
Ring, 276 
Ring algorithms 
Cholesky, 300-3 
Jacobi eigensolver, 434 
Ritz, 
acceleration, 334 
pairs and Arnoldi method, 500 
pairs and Lanczos method, 475 
Rotation of subspaces, 601 
Rounding errors, See 
particular algorithm. 
Roundoff error analysis, 62-7 
Row addition or deletion, 610-1 
How partition, 6 
Row scaling, 125 
Row weighting in LS problem, 265 


Saxpy, 4,5 

Scaling 
linear systems and, 125 

Scaling and squaring for exp( A), 573-4 

Schmidt orthogonalization. See 
Gram-Schmidt, 

Schur complement, 103 

Schur decomposition, 313 
generalized, 377 
matrix functions and, 558-61 
normal matrices and, 313-4 
real matrices and, 341-2 
Symmetric matrices and, 393 


two-by-two symmetric, 427-8 
Schur vectors, 313 
Search directions, 521ff 
Secular equations, 443, 582 
Selective orthogonalizaton, 483—4 
Semidefinite systems, 147-9 
send, 277 
Sensitivity. See Perturbation 
theory for. 
Sep, 325 
Serious breakdown, 505 
Shared memory traffic, 287 
Shared memory systems, 285-9 
Sherman- Morrison formula, 50 
Shifts in 
QR algorithm, 353, 356 
QZ algorithm, 382-3 
SVD algorithm, 452 
symmetric QR algorithm, 418-20 
Sign function, 372 
Similarity transformation, 311 
condition of, 317 
nonunitary, 314, 317 
Simpson's rule, 570 
Simultaneous diagonalization, 461-3 
Simultaneous iteration. See 
LR iteration, orthogonal iteration 
Treppeniteration, 
Sine of matrix, 566 
Single shift QR iteration, 354-5 
Singular matrix, 50 
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Singular value decomposition (SVD), 70-3 


algorithm for, 253-4, 448, 452 
constrained least squares and, 582-3 
generalized, 465—7 
Lanczos method for, 495-6 
Linear systems and, 80 
numerical rank and, 260-2 
null space and, 71, 602-3 
projections and, 75 
proof of, 70 
pseudo-inverse and, 257 
rank of matrix and, 71 
ridge regression and, 583-5 
subset selection and, 591—4 
subspace intersection and, 604-5 
subspace rotation and, 601 
total least squares and, 596-8 

Singular values 
eigenvalues and, 318 
interlacing properties, 449-50 
minimax characterization, 449 
perturbation of, 450-1 

Singular vectors, 70-1 

Span, 49 

Spectral radius, 511 

Spectrum, 310 

Speed-up, 281 

Splitting, 511 

Square root of à matrix, 149 

S-step Lanczos, 487 

Static Scheduling, 286 
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Stationary values, 621 
Steepest descent and conjugate 
gradients, 520ff 
Store by 
band, 19-20 
block, 45 
diagonal, 21-3 
Stride, 38-40 
Strassen method, 31-3, 66 
Structure exploitation, 16-24 
Sturm sequences, 440 
Submatrix, 27 
Subordinate norm, 56 
Subset selection, 590 
Subspace, 49 
angles between, 603-4 
basis for, 49 
deflating, 381, 386 
dimension, 49 
distance between, 76-7 
intersection, 604-5 
invariant, 372, 307—403 
null space intersection, 602-3 
orthogonal projections onto, 
rotation of, 601 
Successive over-relaxation (SOR), 514 
Symmetric eigenproblem, 391(T 
Symmetric indefinite systems, 1611 
Symmetric positive definite systems, 
Lanczos and, ff 
Symmetric storage, 20-2 
Symmetric successive over-relaxation, 
(SSOR), 516-7 
sym.schur, 427 
SYMMLQ, 494 
Sweep, 429 
Sylvester equation, 366-9 
Sylvester law of inertia, 403 


Taylor approximation of e^, 565-7 
Threshold Jacobi, 436 
Toeplitz matrix methods, 193ff 
Torus, 276 
Total least squares, 595ff 
Trace, 310 
Transformation matrices 
Fast Givens, 218-21 
Gauss, 94-5 
Givens, 215 
Householder, 209 
Hyperbolic, 611-2 
Trench algorithm, 199 
Treppeniteration, 335-6 
Triangular matrices, 93 
multiplication between, 17 
unit, 92 
Triangular systems, BBF 
band, 153-4 
multiple, 91 
Non-square, 92 
Tridiagonalization, 
Householder, 414 
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Krylov subspaces and, 416 

Lanczos, 472 
Tridiagonal matrices, 416 

inverse of, 537 

QR algorithm and, 417ff 
Tridiagonal systems, 156-7 


ULV updating, 613-8 
Underdetermined systems, 271—3 
Underflow, 61 

Unit roundoff, 61 

Unit stride, 38—40 

Unitary matrix, 73 

Unreduced Hessenberg matrices, 346 
Unsymmetric eigenproblem, 308ff 
Unsymmetric Lanczos method, 503-6 
Unsymmetric positive definite systems, 142 
Updating the QR factorization, 606-13 


Vandermonde systems, 183-8 
Variance-covariance matrix, 245-6 
Vector length issue, 37-8 
Vector notation, 4 
Vector norms, 52ff 
Vector operations, 4, 36 
Vector touch, 41-2 
Vector computing 
models, 37 
operations, 4, 36 
pipelining, 35-6 
Vectorization, 34ff, 157-8 


Weighting 
column, 264-5 
row, 586 
See also Scaling, 
Wielandt-Hoffman theorem for 
eigenvalues, 395 
singular values, 450 
Wilkinsen shift, 418 
Work 
least squares methods, 263 
linear system methods, 270 
SVD and, 254 
Workspace, 23 
Wrap mapping, 278 
WY representation, 213-5 


Yule-Walker problem, 194 


